Jarno Rajahalme [Thu, 9 Mar 2017 01:18:22 +0000 (17:18 -0800)]
datapath: Refactor labels initialization.
Upstream commit:
Refactoring conntrack labels initialization makes changes in later
patches easier to review.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time. This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build. In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.
Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Unionize ovs_key_ct_label with a u32 array.
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array. It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has
been received can be confusing. Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.
Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark") Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
openvswitch: Set event bit after initializing labels.
Connlabels are included in conntrack netlink event messages only if
the IPCT_LABEL bit is set in the event cache (see
ctnetlink_conntrack_event()). Set it after initializing labels for a
new connection.
Found upon further system testing, where it was noticed that labels
were missing from the conntrack events.
Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed con
nections.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: 372ce9737d2b ("datapath: Allow matching on conntrack mark") Fixes: 038e34abaa31 ("datapath: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry. Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made. Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.
This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.
Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch also adds a test case to OVS system tests to verify the
behavior.
The following is a more thorough explanation of what is going on:
When we have evidence that an existing conntrack entry could exist, we
must invert the tuple if NAT has already been applied, as the current
packet headers do not match any tuple stored in conntrack. For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:
Original direction tuple: (X,B)
Reply direction tuple: (B,A)
Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:
Current packet tuple: (B,X)
This does not match either of the conntrack tuples above. Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost. When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again. Prior to this patch this
second lookup failed. The datapath flow setup corresponding to the
upcall can succeed, however, allowing all further packets in the reply
direction to re-use the conntrack entry pointer in the skb, so
typically the lookup failure only causes a packet drop.
The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:
Inverted tuple: (X,B)
With this the conntrack entry can be found, matching the original
direction tuple.
This same logic also works for the original direction packets:
Current packet tuple (after reverse NAT): (A,B)
Inverted tuple: (B,A)
While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.
Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.
Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Ilya Maximets [Tue, 21 Feb 2017 14:49:25 +0000 (17:49 +0300)]
id-pool: Allocate the lowest available ids.
This simple change makes id-pool to always allocate the
lowest possible id from the pool. No any other code affected
because, actually, there is no users of 'id_pool_free_id' in
OVS.
This behaviour of id-pool will be used in the next patch.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Eric Garver [Tue, 21 Feb 2017 19:22:53 +0000 (14:22 -0500)]
ofp-actions: Fix translation of set_field for nw_ecn
When using set_field for nw_ecn with OF1.0 or OF1.1, you get an error
instead of a proper translation. This use to work before 4b684612d900
("ofp-actions: Translate mod_nw_ecn action to OF1.1 properly.") because
it would fallback to using NXM.
$ ovs-ofctl -O OpenFlow11 add-flow br0 'ip actions=set_field:2->nw_ecn'
ovs-ofctl: none of the usable flow formats (NXM,OXM) is among the
allowed flow formats (OpenFlow11)
Fixes: 4b684612d900 ("ofp-actions: Translate mod_nw_ecn action to OF1.1 properly.") Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Wed, 22 Feb 2017 19:59:41 +0000 (14:59 -0500)]
ovs-tcpdump: Set mirror port mtu
When using ovs-tcpdump to mirror interfaces with MTU larger than the default,
Open vSwitch will lower the interfaces we are interested in monitoring.
Instead, probe the MTU and set the mirrored port's MTU value correctly.
Fixes: 314ce6479a83 ("ovs-tcpdump: Add a tcpdump wrapper utility") Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Thu, 16 Feb 2017 08:47:32 +0000 (00:47 -0800)]
dpdk: Export packet_set_ipv6_addr() for DPDK.
The NAT changes in this series need both packet_set_ipv4_addr()
and packet_set_ipv6_addr() exporting, however, the ipv4 api was
exported with an unrelated patch.
Ben Pfaff [Thu, 26 Jan 2017 18:26:30 +0000 (10:26 -0800)]
ovs-fields.7: Use a more general approach to groff encodings.
It turns out that, since groff 1.20 around 2009, groff comes with a
preprocessor named "preconv" that can fix encoding issues. Use it instead
of the existing hack.
Jarno Rajahalme [Thu, 23 Feb 2017 19:27:57 +0000 (11:27 -0800)]
dpif-netdev: Simple DROP meter implementation.
Meters may be used by any flow, so some kind of locking must be used.
In this version we have an adaptive mutex for each meter, which may
not be optimal for DPDK. However, this should serve as a basis for
further improvement.
A batch of packets is first tried as a whole, and only if some of the
meter bands are hit, we need to process the packets individually.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 8 Mar 2017 04:48:08 +0000 (20:48 -0800)]
Makefile: Drop vestiges of support for non-GNU Make.
Open vSwitch has documented a requirement for GNU Make for a long time, yet
it had vestiges catering to other make implementations. This removes
those.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Leif Madsen [Mon, 6 Mar 2017 20:46:43 +0000 (15:46 -0500)]
packaging: Make Fedora spec file CentOS compatible
On CentOS, the package names aren't prefixed with python2, but rather
are prefixed with simply python. This change addresses that and fixes
up some documentation that was outdated, and updates the Vagrantfile
to use the proper spec file and package names.
doc: Add info on distributions shipping openvswitch package.
List details of various popular distributions shipping Open vSwitch
packages. Also include the information of the distros supporting DPDK
accelerated datapath.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Markos Chandras [Sat, 4 Feb 2017 17:11:11 +0000 (17:11 +0000)]
windows: automake.mk: Remove the .gitignore file from distributed files
Commit d183efc22b2b ("This commit adds the windows installer to the
OVS tree.) added the .gitignore file to the distributed files but this
file shouldn't be part of the distributed archive.
CC: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Fixes: d183efc22b2b ("This commit adds the windows installer to the OVS tree.") Signed-off-by: Markos Chandras <mchandras@suse.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
Mickey Spiegel [Fri, 3 Feb 2017 04:48:24 +0000 (20:48 -0800)]
ovn: specify options:nat-addresses as "router"
Currently in OVN, the "nat-addresses" in the "options" column of a
logical switch port of type "router" must be specified manually.
Typically the user would specify as "nat-addresses" all of the NAT
external IP addresses and load balancer IP addresses that have
already been specified separately on the router.
This patch allows the logical switch port's "nat-addresses" to be
specified as the string "router". When ovn-northd sees this string,
it automatically copies the following into the southbound
Port_Binding's "nat-addresses" in the "options" column:
The options:router-port's MAC address.
Each NAT external IP address (of any NAT type) specified on the
logical router of options:router-port.
Each load balancer IP address specified on the logical router of
options:router-port.
This will cause the controller where the gateway router resides to
issue gratuitous ARPs for each NAT external IP address and for each
load balancer IP address specified on the gateway router.
datapath-windows: Trigger conntrack event after setting mark and label
New Conntrack Entry event should be triggered after setting the mark and
label fields. The current RW lock implementation prevents Event Handler
from reading the entry until mark/label is set.
Fixing the workflow to trigger the event after setting mark/label.
Russell Bryant [Tue, 7 Mar 2017 16:14:30 +0000 (11:14 -0500)]
flake8: Fix build with flake8-import-order installed.
OpenStack CI is currently failing due to some flake8 warnings
emitted from the flake8-import-order plugin. Just ignore all of
those warnings since they're just style things that aren't important.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Tue, 14 Feb 2017 22:40:04 +0000 (14:40 -0800)]
xlate: Translate openflow clone into odp sample action.
When datapath does not support the 'clone' action directly, generate
sample action (with 100% probability) instead.
Specifically, currently, there is no plan to support the 'clone'
action on the Linux kernel datapath directly, so the sample action
will be used to translate the openflow clone action for this datapath.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Allow execute_controller_action() to accept actions encoded with
nested netlink attributes.
execute_controller_action() can be called during 'xlate_actions'. It
tries executes all actions translated so far to get the current packet
that needs to be sent to the controller. This works fine until when
the action is enclosed within a nested netlink message, and the
action translation has not finished yet.
For example;
A, clone(B, controller, C)
In this case, we can not execute 'clone' since its translation has not
be finished (missing C), However, A still needs to be executed before
the packet can be sent to the controller.
This solution is to make a copy of the odp actions translated so far,
and 'fix up' the copy so that it can be executed. The original odp
actions are left intact so that xlate can continue.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Ben Pfaff [Sat, 4 Mar 2017 05:16:17 +0000 (21:16 -0800)]
conntrack: Fix checks for TCP, UDP, and IPv6 header sizes.
Otherwise a malformed packet could cause a read up to about 40 bytes past
the end of the packet. The packet would still likely be dropped because
of checksum verification.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
do_execute_actions() implements a worthwhile optimization: in case
an output action is the last action in an action list, skb_clone()
can be avoided by outputing the current skb. However, the
implementation is more complicated than necessary. This patch
simplify this logic.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 5b8784aaf29b ("openvswitch: Simplify do_execute_actions().") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
openvswitch: maintain correct checksum state in conntrack actions
When executing conntrack actions on skbuffs with checksum mode
CHECKSUM_COMPLETE, the checksum must be updated to account for
header pushes and pulls. Otherwise we get "hw csum failure"
logs similar to this (ICMP packet received on geneve tunnel
via ixgbe NIC):
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This seems to be fine for all prior Linux versions as well.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
The ports which are attached mrouters or hosts, were destroyed
by users via ovs-vsctl commands. Currently the vswitch will
segfault if users use "ovs-appctl mdb/show" to show mdb info.
This patch avoids a segfault.
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Signed-off-by: Ben Pfaff <blp@ovn.org>
mcast-snooping: Flush ports mdb when VLAN configuration changed.
If VLAN configuration(e.g. id, mode) change occurs, the IGMP
snooping-learned multicast groups from this port on the VLAN are
deleted. This avoids a MCAST_ENTRY_DEFAULT_IDLE_TIME delay before
mdb is updated again. Hardware switches (e.g. cisco) also do that.
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Signed-off-by: Ben Pfaff <blp@ovn.org>
netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.
2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.
"int" being used as an array index needs to be sign-extended
to 64-bit before being used.
void f(long *p, int i)
{
g(p[i]);
}
roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call g
MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.
Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:
static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}
And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]
However, overall balance is in negative direction:
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer notes]
It looks like changing the type of this doesn't affect the build on older
kernels, so we can just make the change. I didn't go through all of the
compat code to update the net_id variables there as none of that code should
be enabled on kernels with this patch.
Upstream: c7d03a00b56f ("netns: make struct pernet_operations::id unsigned int") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Allow ARPHRD_NONE interfaces to be added to ovs bridge.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
It's not allowed to push Ethernet header in front of another Ethernet
header.
It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer notes]
Fix build with the upstream commit by folding in the required switch
case enum handlers.
Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Extend the ovs flow netlink protocol to support L3 packets. Packets without
OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the
OVS_KEY_ATTR_ETHERTYPE attribute is mandatory.
Push/pop vlan actions are only supported for Ethernet packets.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 87e159c59d9f325d571689d4027115617adb32e6
Author: Jarno Rajahalme <jarno@ovn.org>
Date: Mon Dec 19 17:06:33 2016 -0800
openvswitch: Add a missing break statement.
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.
Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit df30f7408b187929dbde72661c7f7c615268f1d0
Author: pravin shelar <pshelar@ovn.org>
Date: Mon Dec 26 08:31:27 2016 -0800
openvswitch: upcall: Fix vlan handling.
Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <jarno@ovn.org> CC: Jiri Benc <jbenc@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer Notes]
Squashed in the following upstream commits to retain bisectability: 87e159c59d9f ("openvswitch: Add a missing break statement.") df30f7408b18 ("openvswitch: upcall: Fix vlan handling.")
Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).
Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
openvswitch: support MPLS push and pop for L3 packets
Update Ethernet header only if there is one.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
We'll need it to alter packets sent to ARPHRD_NONE interfaces.
Change do_output() to use the actual L2 header size of the packet when
deciding on the minimum cutlen. The assumption here is that what matters is
not the output interface hard_header_len but rather the L2 header of the
particular packet. For example, ARPHRD_NONE tunnels that encapsulate
Ethernet should get at least the Ethernet header.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer notes]
This is not identical to upstream, because the OVS tree is missing
upstream commit c66549ffd666 ("openvswitch: correctly fragment packet
with mpls headers")
Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.
It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.
Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
openvswitch: use hard_header_len instead of hardcoded ETH_HLEN
On tx, use hard_header_len while deciding whether to refragment or drop the
packet. That way, all combinations are calculated correctly:
* L2 packet going to L2 interface (the L2 header len is subtracted),
* L2 packet going to L3 interface (the L2 header is included in the packet
lenght),
* L3 packet going to L3 interface.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
netfilter: handle NF_REPEAT from nf_conntrack_in()
NF_REPEAT is only needed from nf_conntrack_in() under a very specific
case required by the TCP protocol tracker, we can handle this case
without returning to the core hook path. Handling of NF_REPEAT from the
nf_reinject() is left untouched.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[Committer notes]
Shift the functionality into the compat code, protected by v4.10
version check. This allows the datapath/conntrack.c to match
upstream.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
While looking into an MTU issue with sfc, I started noticing that almost
every NIC driver with an ndo_change_mtu function implemented almost
exactly the same range checks, and in many cases, that was the only
practical thing their ndo_change_mtu function was doing. Quite a few
drivers have either 68, 64, 60 or 46 as their minimum MTU value checked,
and then various sizes from 1500 to 65535 for their maximum MTU value. We
can remove a whole lot of redundant code here if we simple store min_mtu
and max_mtu in net_device, and check against those in net/core/dev.c's
dev_set_mtu().
In theory, there should be zero functional change with this patch, it just
puts the infrastructure in place. Subsequent patches will attempt to start
using said infrastructure, with theoretically zero change in
functionality.
CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 91572088e3fdbf4fe31cf397926d8b890fdb3237
Author: Jarod Wilson <jarod@redhat.com>
Date: Thu Oct 20 13:55:20 2016 -0400
net: use core MTU range checking in core net infra
...
openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
is the largest possible size supported
...
Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Upstream commit:
commit 425df17ce3a26d98f76e2b6b0af2acf4aeb0b026
Author: Jarno Rajahalme <jarno@ovn.org>
Date: Tue Feb 14 21:16:28 2017 -0800
openvswitch: Set internal device max mtu to ETH_MAX_MTU.
Commit 91572088e3fd ("net: use core MTU range checking in core net
infra") changed the openvswitch internal device to use the core net
infra for controlling the MTU range, but failed to actually set the
max_mtu as described in the commit message, which now defaults to
ETH_DATA_LEN.
This patch fixes this by setting max_mtu to ETH_MAX_MTU after
ether_setup() call.
Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This backport detects the new max_mtu field in the struct netdevice
and uses the upstream code if it exists, and local backport code if
not. The latter case is amended with bounds checks with new upstream
macros ETH_MIN_MTU and ETH_MAX_MTU and the corresponding error
messages from the upstream commit.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
Some symbols exported to other modules are really used only by
openvswitch.ko. Remove the exports.
Tested by loading all 4 openvswitch modules, nothing breaks.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
The internal device does support 802.1AD offloading since 018c1dda5ff1
("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
attributes").
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 3145c037e749 ("openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
openvswitch: avoid resetting flow key while installing new flow.
since commit commit db74a3335e0f6 ("openvswitch: use percpu
flow stats") flow alloc resets flow-key. So there is no need
to reset the flow-key again if OVS is using newly allocated
flow-key.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
openvswitch: Fix Frame-size larger than 1024 bytes warning.
There is no need to declare separate key on stack,
we can just use sw_flow->key to store the key directly.
This commit fixes following warning:
net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’:
net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes
is larger than 1024 bytes [-Wframe-larger-than=]
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
Upstream commit:
commit db74a3335e0f645e3139c80bcfc90feb01d8e304
Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Date: Thu Sep 15 19:11:53 2016 -0300
openvswitch: use percpu flow stats
Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.
On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.
This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
datapath: fix flow stats accounting when node 0 is not possible
Upstream commit:
commit 40773966ccf1985a1b2bb570a03cbeaf1cbd4e00
Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Date: Thu Sep 15 19:11:52 2016 -0300
openvswitch: fix flow stats accounting when node 0 is not possible
On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.
However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.
Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contained a memory leak that is fixed in this backport.
The next patch silently fixed that in upstream, too.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.
This is based on Thomas F Herbert's original v20 patch. I made some
small clean ups and bug fixes.
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 20ecf1e4e30005ad50f561a92c888b6477f99341
Author: Jiri Benc <jbenc@redhat.com>
Date: Mon Oct 10 17:02:42 2016 +0200
openvswitch: vlan: remove wrong likely statement
This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.
Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
commit 72ec108d701506fa6cd2f66ec5b15ea71df3c464
Author: Jiri Benc <jbenc@redhat.com>
Date: Mon Oct 10 17:02:43 2016 +0200
openvswitch: fix vlan subtraction from packet length
When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
header is not counted in skb->len. It doesn't make sense to subtract it.
Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
[Committer notes]
The following commits upstream fix bugs in this patch, so to retain
bisectability of the OVS tree they were rolled into this commit:
This is to simplify using double tagged vlans. This function allows all
valid vlan ethertypes to be checked in a single function call.
Also replace some instances that check for both ETH_P_8021Q and
ETH_P_8021AD.
Patch based on one originally by Thomas F Herbert.
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
vlan: Introduce helper functions to check if skb is tagged
Separate the two checks for single vlan and multiple vlans in
netif_skb_features(). This allows us to move the check for multiple
vlans to another function later.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
Fix cvlan test failure on old kernel versions with 802.1ad. The root
cause is the upcall re-inserts the VLAN back into the raw packet data,
but the TPID is hard coded to 0x8100. This affects kernels for which
HAVE_VLAN_INSERT_TAG_SET_PROTO is not set.
The below patch allows the cvlan and 802.ad tests to pass on debian
with 3.16 kernel.
Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
Eelco Chaudron [Wed, 8 Feb 2017 16:28:22 +0000 (17:28 +0100)]
rhel-systemd: Document systemd behavior
This is a follow up patch to document the systemd behavior including
the change introduced by the "rhel-systemd: Restart openvswitch
service if a daemon crashes", still under review.
Eelco Chaudron [Mon, 27 Feb 2017 20:56:41 +0000 (15:56 -0500)]
rhel-systemd: Restart openvswitch service if a daemon crashes
Currently if either ovsdb-server or ovs-vswitchd is crashing the
daemon is not restarting leaving the system in faulty state.
This patch will detect the daemon crash and will restart the
openvswitch service.
Here is a (bit to wide) table showing the behavior before and after
the patch. Note that only the Crash behavior has changed:
The above command is a service trigger available since Windows 7.
More on the topic:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd405513%28v=vs.85%29.aspx
In out case we will wait until Microsoft-Windows-Hyper-V-VMMS has triggered
that the WMI provider: VmmsWmiEventProvider has started.
The change is needed because the network service inside VMMS starts slower than
ovs-vswitchd, which will cause a race condition because we check if the OVS
extension is enabled on a single switch.
Aaron Conole [Tue, 21 Feb 2017 22:31:05 +0000 (17:31 -0500)]
ovs-ctl: allow passing user:group to daemons
The Open vSwitch daemons allow passing --user user[:group] to allow
spawning under different user privileges. ovs-ctl now accepts --ovs-user
in the same form to pass this argument on, as well as create databases and
data directories with the appropriate privileges.
Andy Zhou [Thu, 23 Feb 2017 08:38:16 +0000 (00:38 -0800)]
ofproto/bond: Fix bond post recirc rule leak.
When bond is removed or when its configuration changes,
the post recirculation rules that are installed by current
bond configuration, if any, should be also be removed.
Reported-by: Huanle Han <hanxueluo@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328969.html CC: Huanle Han <hanxueluo@gmail.com> Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Huanle Han <hanxueluo@gmail.com>
Andy Zhou [Thu, 23 Feb 2017 07:31:31 +0000 (23:31 -0800)]
ofproto/bond: Fix bond reconfiguration race condition.
During the upcall thread bond output translation, bond_may_recirc()
is currently called outside the lock. In case the main thread executes
bond_reconfigure() at the same time, the upcall thread may find bond
state to be inconsistent when calling bond_update_post_recirc_rules().
This patch fixes the race condition by acquiring the write lock
before calling bond_may_recirc(). The APIs are refactored slightly.
The race condition can result in the following stack trace. Copied
from 'Reported-at':
Numan Siddique [Wed, 22 Feb 2017 14:58:36 +0000 (20:28 +0530)]
ovn pacemaker: Pass --db-(n/s)b-addr option when starting ovsdb-servers
When pacemaker script, starts the ovsdb-servers in all the nodes,
it doesn't pass the --db-(n/s)b-addr=MASTER_IP option.
When pacemaker promotes a master, it won't be listening on the
master ip address unless "ovn-nbctl set-connection" is used.
In this patch this option, along with --db-(n/s)b-create-insecure-remote=yes
for "tcp" connection types is passed when starting the OVN ovsdb-servers
to overcome this issue.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Andy Zhou <azhou@ovn.org>
Joe Stringer [Fri, 10 Feb 2017 23:01:11 +0000 (15:01 -0800)]
doc: Describe backporting process.
This patch documents the backporting process, and provides a walkthrough
for developers who would like to backport upstream Linux patches into
the Open vSwitch tree. Nothing in this documentation should be
surprising or new; it merely puts the existing process into words.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Stephen Finucane <stephen@that.guru>
Yi-Hung Wei [Sat, 18 Feb 2017 01:47:44 +0000 (17:47 -0800)]
meta-flow: Remove cmap dependency.
Previous patch 04f48a68 ("ofp-actions: Fix variable length meta-flow OXMs.")
introduced dependency of an internal library (cmap.h) to ovs public
interface (meta-flow.h) that may cause potential building problem. In this
patch, we remove cmap from struct mf_field, and provide a wrapper struct
vl_mff_map that resolve the dependency problem.
Fixes: 04f48a68c428 ("ofp-actions: Fix variable length meta-flow OXMs.") Suggested-by: Joe Stringer <joe@ovn.org> Suggested-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>