Alex Wang [Tue, 8 Jul 2014 04:58:33 +0000 (21:58 -0700)]
dpif-linux: Recheck the socket pointer existence before getting its pid.
This commit fixes a race between port deletion and flow miss handling.
More specifically, a port could be removed by main thread while
the handler thread is handling the flow miss from it. If the flow
requires slow path action, the handler thread will try querying a pid
from port's socket. Since the port has been deleted, the query will
cause a dereference of NULL socket pointer.
This commit makes the handler thread recheck the socket pointer before
dereferencing it.
Joe Stringer [Fri, 4 Jul 2014 06:58:28 +0000 (06:58 +0000)]
tests: Fix race in 'balance-tcp bonding' test.
Running the test in a tight loop could cause this test to fail after
about 5 runs, with some of the ports reporting "may_enable: false" in
the "ovs-appctl bond/show" output. This commit fixes the race condition
by waiting for may_enable to be true for all bond ports.
I suspect that LACP negotiation finishes, but the main thread doesn't
have a chance to enable the ports before we send the test packets.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
This patch fixes two compile warnings introduced by commit 64b73291 ("util: create a copy of program_name"):
1. ../lib/util.c:457:5: error: passing argument 1 of 'free'
discards 'const' qualifier from pointer target type; And
2. ../lib/util.c:463:5: error: ISO C90 forbids mixed declarations
and code [-Werror=declaration-after-statement] (affected only
branch-2.3 that is C90 compliant and not the master)
Reported-By: Joe Stringer <jstringer@nicira.com> Reported-By: Lorand Jakab <lojakab@cisco.com> Signed-Off-By: Ansis Atteka <aatteka@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>
This is a prerequisite step in making the classifier lookups lockless.
If taking a reference fails, we do the lookup again, as a new (lower
priority) rule may now match instead.
Also remove unwildcarding dl_type and nw_frag, as these are already
taken care of by xlate_actions().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
After a quick analysis, in most cases the access to refcounted objects
is clearly protected either with an explicit lock/mutex, or RCU. there
are only a few places where I left a call to ovs_refcount_unref().
Upon closer analysis it may well be that those could also use the
relaxed form.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
When a reference counted object is also RCU protected the deletion of
the object's memory is always postponed. This allows
memory_order_relaxed to be used also for unreferencing, as RCU
quiescing provides a full memory barrier (it has to, or otherwise
there could be lingering accesses to objects after they are recycled).
Also, when access to the reference counted object is protected via a
mutex or a lock, the locking primitives provide the required memory
barrier functionality.
Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if
the refcount is non-zero and returns true if a reference was taken,
false otherwise. This can be used in combined RCU/refcount scenarios
where we have an RCU protected reference to an refcounted object, but
which may be unref'ed at any time. If ovs_refcount_try_ref_rcu()
fails, the object may still be safely used until the current thread
quiesces.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
ovs-atomic: Use explicit memory order for ovs_refcount.
Use explicit variants of atomic operations for the ovs_refcount to
avoid the overhead of the default memory_order_seq_cst.
Adding a reference requires no memory ordering, as the calling thread
is already assumed to have protected access to the object being
reference counted. Hence, memory_order_relaxed is used for
ovs_refcount_ref(). ovs_refcount_read() does not change the reference
count, so it can also use memory_order_relaxed.
Unreferencing an object needs a release barrier, so that none of the
accesses to the protected object are reordered after the atomic
decrement operation. Additionally, an explicit acquire barrier is
needed before the object is recycled, to keep the subsequent accesses
to the object's memory from being reordered before the atomic
decrement operation.
This patch follows the memory ordering and argumentation discussed
here:
Suppress a gcc warning which was introduced by
commit e0b48482c16b6eaa7f14d8c7e7c6275528881b9e.
("util: create a copy of program_name")
I guess MSVC doesn't have a corresponding warning.
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Acked-by: Lorand Jakab <lojakab@cisco.com>
Commit 8a9562 ("dpif-netdev: Add DPDK netdev.") reversed sequence
in which set_program_name() and proctitle_init() functions are
called. This introduced a regression where program_name and argv_start
would point to exactly the same memory (previously both of these
pointers were pointing to different memory locations because
proctitle_init() would have beforehand created a copy of argv[0]
for the succeeding set_program_name() call).
This regression on my system caused ovs-vswitchd monitoring process to
show up without process name:
... 00:00:00 : monitoring pid 26308 (healthy)
Ps output was lacking process name because following code was
using overlapping memory for source and target buffer:.
Joe Stringer [Tue, 1 Jul 2014 09:54:18 +0000 (09:54 +0000)]
revalidator: Simplify push_dump_ops__().
Commit acaa8dac49 (revalidator: Eliminate duplicate flow handling.)
ensured that a ukey will always exist for a given flow, even if it is
about to be deleted. This means that push_dump_ops__() no longer needs
to handle the case where there is no ukey. This commit removes the
redundant code.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
datapath: Additional logging for -EINVAL on flow setups.
There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.
Ben Pfaff [Mon, 30 Jun 2014 21:57:42 +0000 (14:57 -0700)]
netlink-socket: Work around upstream kernel Netlink bug.
The upstream kernel net/netlink/af_netlink.c netlink_recvmsg() contains the
following code to refill the Netlink socket buffer with more dump skbs
while a dump is in progress:
if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}
The netlink_dump() function that this calls returns a negative number on
error, the convention used throughout the kernel, and thus sk->sk_err
receives a negative value on error.
However, sk->sk_err is supposed to contain either 0 or a positive errno
value, as one can see from a quick "grep" through net for 'sk_err =', e.g.:
The result is that the next attempt to receive from the socket will return
the error to userspace with the wrong sign.
(The root of the error in this case is that multiple threads are attempting
to read a single flow dump from a shared fd. That should work, but the
kernel has an internal race that can result in one or more of those threads
hitting the EINVAL case at the start of netlink_dump(). The EINVAL is
harmless in this case and userspace should be able to ignore it, but
reporting the EINVAL as if it were a 22-byte message received in userspace
throws a real wrench in the works.)
This bug makes me think that there are probably not many programs doing
multithreaded Netlink dumps. Maybe it is good that we are considering
other approaches.
VMware-BZ: #1255704 Reported-by: Mihir Gangar <gangarm@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Joe Stringer [Wed, 2 Jul 2014 07:41:33 +0000 (07:41 +0000)]
revalidator: Improve optimization to skip revalidation.
The should_revalidate() optimisation introduced with commit 698ffe3623
(revalidator: Only revalidate high-throughput flows.) was a little
aggressive, occasionally deleting flows even when OVS is quite capable
of performing full revalidation.
This commit modifies the logic to:
* Firstly, check if we are likely to handle full revalidation, and
attempt that instead.
* Secondly, fall back to the existing flow throughput estimations to
determine whether to revalidate the flow or just delete it.
Simon Horman [Mon, 30 Jun 2014 04:20:14 +0000 (13:20 +0900)]
datapath: Allow pop and push MPLS actions after pop VLAN
This patch loosens the restrictions surrounding push and pop MPLS actions
such that they will be allowed after a pop VLAN action if the inner
ethernet type is acceptable for pop and push MPLS actions. This implies
that there is only one VLAN tag present.
Some analysis of logic of this change is as follows:
The purpose of tracking vlan_tci is to allow prohibition of push
and pop MPLS actions in the presence of a VLAN. In this scenario
the VLAN_TAG_PRESENT bit of vlan_tci is set and eth_type is that of
the packet with the outermost VLAN tag removed.
A pop VLAN action may clear vlan_tci as it removes the outermost
VLAN tag and the push and pop MPLS logic may rely on eth_type for
their prohibition logic.
This will not allow push and pop MPLS on packets with multiple VLAN
tags, regardless of if they are all remove using POP VLAN, as there
is no mechanism to expose the inner ethernet type beyond that of
the outermost VLAN tag.
Alex Wang [Mon, 30 Jun 2014 21:51:02 +0000 (14:51 -0700)]
datapath: Use exact lookup for flow_get and flow_del.
Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath. And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.
This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Jesse Gross [Mon, 30 Jun 2014 20:43:25 +0000 (13:43 -0700)]
datapath: Change u64_stats_* to use _irq instead of _bh().
The upstream u64_stats API has been changed to remove the _bh()
versions and switch all consumers to use IRQ safe variants instead.
This was done to be safe for netpoll generated packets, which can
occur in hard IRQ context. From a safety perspective, this doesn't
directly affect OVS since it doesn't support netpoll. However, this
change has been backported to older kernels so OVS needs to use the
new API to compile.
test-vconn: Change the expected error for Windows.
On Windows ECONNRESET is WSAECONNRESET.
Also, "unix" connections are done through TCP sockets.
For the 'refuse-connection' test, the error message for Windows
is WSAECONNRESET instead of EPIPE.
This just makes ovs-benchmark compile on windows.
This lets us go ahead with just a 'make' instead of
picking and choosing executables that are tested to work on
windows as arguments for make.
This commit does not make ovs-benchmark a supported utility
on windows.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ryan Wilson [Fri, 27 Jun 2014 01:16:39 +0000 (18:16 -0700)]
netdev-dpdk: Fix memory leak in dpdk_do_tx_copy().
This patch fixes a bug where rte_pktmbuf_alloc() would fail and
packets which succeeded to allocate memory with rte_pktmbuf_alloc()
would not be sent and leak memory.
Also, as a byproduct of using a local variable to record dropped
packets, this reduces the locking of the netdev's mutex when
multiple packets are dropped in dpdk_do_tx_copy().
Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Ryan Wilson [Fri, 27 Jun 2014 00:41:46 +0000 (17:41 -0700)]
netdev-dpdk: Set current timestamp when flushing TX queue.
The current timestamp should be set every time the queue is flushed.
Thus, if DRAIN_TSC timer cycles have passed since the last timestamp,
the send queue should be flushed again.
Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
poll-loop: Create Windows event handles for sockets automatically.
We currently have a poll_fd_wait_event(fd, wevent, events) function that
is used at places common to Windows and Linux where we have to wait on
sockets. On Linux, 'wevent' is always set as zero. On Windows, for sockets,
when we send both 'fd' and 'wevent', we associate them with each other for
'events' and then wait on 'wevent'. Also on Windows, when we only send 'wevent'
to this function, we would simply wait for all events for that 'wevent'.
There is a disadvantage with this approach.
* Windows clients need to create a 'wevent' and then pass it along. This
means that at a lot of places where we create sockets, we also are forced
to create a 'wevent'.
With this commit, we pass the responsibility of creating a 'wevent' to
poll_fd_wait() in case of sockets. That way, a client using poll_fd_wait()
is only concerned about sockets and not about 'wevents'. There is a potential
disadvantage with this change in that we create events more often and that
may have a performance penalty. If that turns out to be the case, we will
eventually need to create a pool of wevents that can be re-used.
In Windows, there are cases where we want to wait on a event (not
associated with any sockets) and then control it using functions
like SetEvent() etc. For that purpose, introduce a new function
poll_wevent_wait(). For this function, the client needs to create a event
and then pass it along as an argument.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-By: Ben Pfaff <blp@nicira.com>
Pravin B Shelar [Thu, 26 Jun 2014 22:15:00 +0000 (15:15 -0700)]
datapath: Initialize OVS_CB in ovs_vport_receive()
On packet recv OVS CB is initialized in multiple function.
Following patch moves all these initialization to
ovs_vport_receive().
This patch also save a check in execute actions.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Polehn, Mike A [Thu, 19 Jun 2014 22:58:26 +0000 (22:58 +0000)]
dpdk: High speed PMD physical NIC queue size
Large TX and RX queues are needed for high speed 10 GbE physical NICS.
Observed a 250% zero loss improvement over small NIC queue test for
port to port flow test.
Signed-off-by: Mike A. Polehn <mike.a.polehn@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Thomas Graf [Fri, 27 Jun 2014 07:31:57 +0000 (09:31 +0200)]
build: Allow building with autoconf 2.63
Reduces the dependency on autoconf from 2.64 to 2.63 to ease building
on older platforms. There is only a few macros missing and they can
be provided easily.
A handful of tests needed modification. The difference in quoting
behaviour between 2.63 and later require the m4_define() to be
manually unfolded.
The Debian control file is left untouched on purpose. The decision
whether to adjust the dependency is left to the respective maintainers.
Tested with autoconf 2.63 and 2.69.
Cc: Scott Mann <smann@noironetworks.com> Cc: Don Kehn <dkehn@noironetworks.com> Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
ovs-ofctl: Ability to read a hex string from file.
The unit test, "OFPST_TABLE reply - OF1.2" in ofp-print.at
sends a very large hex string as an argument to 'ovs-ofctl ofp-print'.
The length of the hex string exceeds the maximum command line length
in Windows. With this commit, we can pass the same hex string by
placing it inside a file.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
ovsdb-tool: Workaround inability to replace existing file on Windows.
rename() on an existing destination file fails on Windows. This commit
worksaround that problem.
There are two tests that test it. But both of them use the ovsdb-server's
--run option for the test and it does not exist in Windows. So change
the test to workaround the lack of that feature.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
ovsdb-server.at: Handle different error message for already opened database.
Commit ebed9f78(ovsdb-server: Improve message for "add-db" of
database already open.) improved the error message seen when
opening an already opened database on Linux. For Windows,
we still need to look for the lockfile error message.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
sflow feature needs to be investigated for Windows. Right now
test-sflow related tests do not pass because of LOOPBACK_INTERFACE
constraints for 'agent'. Add a TODO item and skip the tests.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
file_name.at: Skip a symlink related test for Windows.
There is no one-one mapping of symlinks between Linux and
Windows. This test currently fails on Windows and we do not
really need this functionality on Windows. So skip it.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
ovs-ofctl.at: Prevent msys from getting confused with ipv6 address.
msys has a set of rules which triggers an automatic conversion of
arguments into something else to suit Windows requirements. Sometimes
this also causes unwanted conversions. Details of the rules is here:
http://www.mingw.org/wiki/Posix_path_conversion
msys converts ::1/::1 into ;1\;1. To prevent this, use fullform
ipv6 address of the form 0:0:0:0:0:0:0:1 instead.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
rconn: Don't warn when peer abruptly closes connection.
On Windows, when a peer terminates without calling a close
on socket fd, the server ends up printing "connection dropped"
warning messages. We probably don't want those warning messages
when the error is WSAECONNRESET.
(In OVS unit tests on Windows, anytime a client like ovs-ofctl
calls a ovs_fatal without clean close(fd) on the socket, the
server like ovs-vswitchd prints warnings that cause unit tests
to fail.)
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
For this particular test, we pass the PKIDIR through a
ovsdb-tool transact and msys does not convert the path style.
(On Windows, we have to pass the directory in the form C:/foo/bar.pem.)
So get the Windows style path through 'pwd -W'(which is called through
the function pwd ())
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
stream-tcp: Cleanup files created for Windows "unix" sockets.
On Windows, we create "unix sockets" by creating TCP sockets
and hiding the TCP port number in files. When we close the
pstream session, we need to delete the file.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Fri, 13 Jun 2014 17:38:05 +0000 (10:38 -0700)]
lib/classifier: Optimize megaflows for single rule case.
When, during a classifier lookup, we narrow down to a single potential
rule, it is enough to match on ("unwildcard") one bit that differs
between the packet and the rule.
This is a special case of the more general algorithm, where it is
sufficient to match on enough bits that separates the packet from all
higher priority rules than the matched rule. For a miss that would be
all the rules. Implementing this is expensive for a more than a few
rules. This patch starts by doing this for a single rule when we
already have it, also reducing the lookup cost by finishing the lookup
earlier than before.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Thu, 26 Jun 2014 14:41:25 +0000 (07:41 -0700)]
lib/pvector: Non-intrusive RCU priority vector.
Factor out the priority vector code from the classifier.
Making the classifier use RCU instead of locking requires parallel
access to the priority vector, pointing to subtables in descending
priority order. When a new subtable is added, a new copy of the
priority vector is allocated, while the current readers can keep on
using the old copy they started with. Adding and removing subtables
is usually less frequent than adding and removing rules, so this
should not have a visible performance implication. As an optimization
for the userspace datapath use, where all the subtables have the same
priority, new subtables can be added to the end of the vector without
reallocation and without disturbing readers.
cls_subtables_reset() is now removed, as it served its purpose in bug
hunting. Checks on the new pvector are now incorporated into
tests/test-classifier.c.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Mon, 23 Jun 2014 23:46:31 +0000 (08:46 +0900)]
ofproto-dpif: MPLS recirculation
In some cases an pop MPLS action changes a packet to be a non-mpls packet.
In this case subsequent any L3+ actions require access to portions
of the packet which were not decoded as they were opaque when the
packet was MPLS. Allow such actions to be translated by
first recirculating the packet.
Co-authored-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
Ryan Wilson [Mon, 23 Jun 2014 19:36:11 +0000 (12:36 -0700)]
dpif-netdev: Implement batched flow dumping.
Previously, flows were retrieved one by one when dumping flows for
datapaths of type 'netdev'. This increased contention for the dump's
mutex, negatively affecting revalidator performance.
This patch retrieves batches of flows when dumping flows for datapaths
of type 'netdev'.
Signed-off-by: Ryan Wilson <wryan@nicira.com>
[blp@nicira.com relaxed max_flows restriction] Signed-off-by: Ben Pfaff <blp@nicira.com>
dpif-netdev: Delete packet if not able to do upcall
In dp_netdev_input() we nevered fully covered the case where handler queues are
not there.
With this change we increment the stat counter and free the packet.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
netdev-dpdk: Disable NIC offloading and multiseg mbufs
We do not use any offloading (now) or multiple segments per packet, so
we might as well disable those features while configuring the NIC.
This could give performance improvements. For ixgbe, for example, this change
allows the driver to use a simpler tx routine, resulting in throuput
improvements (~7.5%)
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
netdev-dpdk: Fix coding style in TX/RX conf structs
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
netdev-dpdk: Count and delete every dropped packet
Commit f4fd623c4c25 introduced a bug in netdev_dpdk_send(): if multiple
consecutive packets exceed MTU, only the first one is deleted and
counted.
This should fix the bug
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Pravin B Shelar [Tue, 24 Jun 2014 20:00:52 +0000 (13:00 -0700)]
lib: Rename ofp to buf.
dpif-packet contains ofpbuf which points to packet data. Here buf
is better name rather than ofp.
Following patch renames all remaining instances of ofp variable.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Jesse Gross [Wed, 25 Jun 2014 01:28:08 +0000 (18:28 -0700)]
datapath: Rehash 16-bit skbuff hashes into 32 bits.
Currently, if the network stack provides skb->rxhash then we use it,
otherwise we compute our own. However, on at least some versions of
RHEL/CentOS, the stack provides a hash that is 16 bits rather than
32 bits. In cases where we use the uppermost bits of the hash this
is particularly bad because we detect that a hash is present and we
use it rather than computing our own but the result is always zero.
This is particularly noticible with tunnel ports that use the hash
to generate a source port, such as VXLAN. On these kernels the tunnel
source port is always the minimum value. To solve this problem while
still taking advantage of the precomputed hash, this rehashes the
hash so that the entropy is spread throughout 32 bits.
Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com>
Ben Pfaff [Tue, 24 Jun 2014 23:39:33 +0000 (16:39 -0700)]
dpif: When executing actions needs help, use "set" action to set tunnel.
Open vSwitch userspace is able to implement some actions that the kernel
doesn't support, such as modifying ARP fields. When it does this for a
tunneled packet, it needs to supply the tunnel information with a "set"
action, because the Linux kernel datapath throws away tunnel information
supplied in the OVS_PACKET_CMD_EXECUTE metadata argument.
Simon Horman [Tue, 24 Jun 2014 11:56:57 +0000 (20:56 +0900)]
datapath: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>