Anupam Chanda [Mon, 21 Dec 2015 20:20:06 +0000 (12:20 -0800)]
ovs-vtep: Clean up local mac entries on startup.
This change handles a corner case where local mac entries are not cleared if a
vlan binding is deleted while the emulator is not running. The fix is to clean
up the local mac entries once on restart.
Pravin B Shelar [Sun, 20 Dec 2015 03:19:22 +0000 (19:19 -0800)]
datapath: stt: Do not access stt_dev socket in lookup.
STT device is added to the device list at device create time. and
the dev socket is initialized when dev is UP. So avoid accessing
stt socket while searching a device.
Joe Stringer [Tue, 15 Dec 2015 19:24:34 +0000 (11:24 -0800)]
compat: Backport conntrack strictly to v3.10+.
The conntrack/ipfrag backport was previously not entirely consistent in
its include for versions 3.9 and 3.10. The intention was to build it for
all kernels 3.10 and newer, so fix the version checks.
Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Tested-by: Simon Horman <simon.horman@netronome.com>
Joe Stringer [Tue, 15 Dec 2015 19:24:33 +0000 (11:24 -0800)]
compat: Always use own __ipv6_select_ident().
If the ip fragmentation backport is enabled, we should always use our
own {,__}ipv6_select_ident(). This fixes the following issue on some
v3.19 kernels:
datapath/linux/ip6_output.c:93:12: error: conflicting types for
‘__ipv6_select_ident’
static u32 __ipv6_select_ident(struct net *net, u32 hashrnd,
Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Tested-by: Simon Horman <simon.horman@netronome.com>
Han Zhou [Fri, 18 Dec 2015 06:23:22 +0000 (22:23 -0800)]
ovsdb: separate json cache for different monitor versions
Cached json objects were reused when sending notifications to
clients. This created a problem when there were different versions
of monitors coexisting. E.g. clients expecting version2 notification
would receive messages with method == "update2" but payload in
version1 format, which end up failure of processing the updates.
This patch fixes the issue by including version in cache node.
Signed-off-by: Han Zhou <zhouhan@gmail.com> Acked-by: Andy Zhou <azhou@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 16 Dec 2015 02:04:20 +0000 (18:04 -0800)]
Use ip_parse() and ipv6_parse() and variants in more places.
This saves some code and improves clarity, in my opinion.
Some of these changes just change an inet_pton() call into a similar
ip_parse() or ipv6_parse() call. In those cases the benefit is better
type safety, since inet_pton()'s output parameter is type "void *".
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Mengke Liu [Tue, 15 Dec 2015 18:47:50 +0000 (02:47 +0800)]
geneve-map-rename: rename geneve-map to tlv-map.
This patch renames the command name related with geneve-map to a more
generic name as following:
add-geneve-map -> add-tlv-map
del-geneve-map -> del-tlv-map
dump-geneve-map -> dump-tlv-map
It also renames the Geneve_table to tlv_table.
By doing this renaming, the NSH variable context header (the same TLV
format as Geneve) or other protocol can reuse the field tun_metadata<N>
in the future.
Signed-off-by: Mengke Liu <mengke.liu@intel.com> Signed-off-by: Ricky Li <ricky.li@intel.com> Signed-off-by: Jesse Gross <jesse@kernel.org>
Russell Bryant [Mon, 14 Dec 2015 17:54:45 +0000 (12:54 -0500)]
ovn: Use constants for conntrack state bits.
A previous commit fixed this code to match changes to the conntrack
state bit assignments. This patch further updates the code to use
the defined constants to ensure this code adapts automatically to any
possible future changes.
Signed-off-by: Russell Bryant <russell@ovn.org> Requested-by: Joe Stringer <joe@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Andy Zhou [Mon, 14 Dec 2015 23:03:23 +0000 (15:03 -0800)]
lib: fix sparse warnings
Fixes the following sparse warning messages:
lib/ovsdb-idl.c:146:12: error: symbol 'table_updates_names' was not
declared. Should it be static?
lib/ovsdb-idl.c:147:12: error: symbol 'table_update_names' was not
declared. Should it be static?
lib/ovsdb-idl.c:148:12: error: symbol 'row_update_names' was not
declared. Should it be static?
Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Alin Serdean [Thu, 10 Dec 2015 22:18:51 +0000 (22:18 +0000)]
confifugre: Fix broken sed calls in shell code.
Commit 43000bc (openvswitch.m4: Portability improvement), which introduced
a portability improvement, also introduces two bugs. This commit fixes
both bug, by adding the command for $SED 's' and changes to x86 for 32 bit
instead of x64.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Tue, 20 Oct 2015 19:50:23 +0000 (12:50 -0700)]
ovsdb: test ovs-vswitchd for backward compatibility
Add test to make sure ovs-vswitchd fall back to use the
"monitor" method when connecting to an older ovsdb-server that
does not support "monitor2".
For testing backward compatibility, add an ovs-appctl command:
"ovsdb-server/disable-monitor2". This command will restart
all currently open jsonrpc connections, but without support for
'monitor2' JSON-RPC method for the new connections.
There is no corresponding enable command, since this feature is only
useful for testing. 'monitor2' will be available when ovsdb-server
restarts.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Thu, 15 Oct 2015 21:09:37 +0000 (14:09 -0700)]
lib: add monitor2 support in ovsdb-idl.
Add support for monitor2. When idl starts to run, monitor2 will be
attempted first. In case the server is an older version that does
not recognize monitor2. IDL will then fall back to use "monitor"
method.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Thu, 15 Oct 2015 21:07:43 +0000 (14:07 -0700)]
ovsdb: generate update2 notification for a monitor2 session
Add functions that can generate "update2" notification for a
"monitor2" session. "monitor2" and "update2" are RFC 7047 extensions
described by ovsdb-server(1) manpage. See the manpage changes
for more details.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Fri, 25 Sep 2015 23:19:48 +0000 (16:19 -0700)]
lib: add diff and apply diff APIs for ovsdb_datum
When an OVSDB column change its value, it is more efficient to only
send what has changed, rather than sending the entire new copy.
This is analogous to software programmer send patches rather than
the entire source file.
For columns store a single element, the "diff" datum is the same
as the "new" datum.
For columns that store set or map, it is only necessary to send the
information about the elements changed (including addition or removal).
The "diff" for those types are all elements that are changed.
Those APIs are mainly used for implementing a new OVSDB server
"update2" JSON-RPC notification, which encodes modifications
of a column with the contents of those "diff"s. Later patch implements
the "update2" notification.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Wed, 14 Oct 2015 23:57:52 +0000 (16:57 -0700)]
lib: avoid set size check when generating diff datum from json
Added ovsdb_transient_datum_from_json() to avoid size check for
the diff datum that is transient in nature.
Suppose a datum contains set, and the max number of elements is 2.
If we are changing from set that contains [A, B], to a set contains
[C, D], the diff datum will contains 4 elements [A, B, C, D].
Thus diff datum should not be constrained by the size limit. However
the datum after diff is applied should not violate the size limit.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@ovn.org>
Since upstream and compat ip_tunnel structures are not same, we can not
use exported upstream functions.
Following patch blocks definitions which used ip_tunnel internal
structure. Function which do not depend on these structures are
allows by explicitly by defining it in the header files. e.g.
iptunnel_handle_offloads(), iptunnel_pull_header(). etc.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Fri, 11 Dec 2015 04:03:00 +0000 (20:03 -0800)]
datapath: define compat ip_tunnel_get_link_net()
Same as ip_tunnel_get_iflink(), function ip_tunnel_get_link_net()
also depends on ip_tunnel structure. So this patch defines
compat implementation for same.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Fri, 11 Dec 2015 04:02:59 +0000 (20:02 -0800)]
datapath: define compat ip_tunnel_get_iflink()
ip_tunnel_get_iflink() depends on ip_tunnel structure. But OVS
compat layer defines its own ip_tunnel structure which is not
compatible with all upstream kernel versions. Therefore we
can no use such function.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Alin Serdean [Fri, 11 Dec 2015 19:18:25 +0000 (19:18 +0000)]
datapath-windows: Add GRE TEB support for windows datapath
This patch introduces the support for GRE TEB (trasparent ethernet bridging)
for the windows datapath.
The GRE support is based on http://tools.ietf.org/html/rfc2890, without
taking into account the GRE sequence, and it supports only the GRE protocol
type 6558 (trasparent ethernet bridging) like its linux counterpart.
Util.h: define the GRE pool tag
Vport.c/h: sort the includes alphabetically
add the function OvsFindTunnelVportByPortType which searches the
tunnelVportsArray for a given port type
Actions.c : sort the includes alphabetically
call the GRE encapsulation / decapsulation functions when needed
Gre.c/h : add GRE type defines
add initialization/cleanup functions
add encapsulation / decapsulation functions with software offloads
(hardware offloads will be added in a separate patch)
support
Tested using: PSPING
(https://technet.microsoft.com/en-us/sysinternals/psping.aspx)
(ICMP, TCP, UDP) with various packet lengths
IPERF3
(https://iperf.fr/iperf-download.php)
(TCP, UDP) with various options
Russell Bryant [Thu, 10 Dec 2015 19:08:44 +0000 (14:08 -0500)]
xml2nroff: Fix issues pointed out by flake8.
This patch includes a few minor fixes pointed out by the flake8 tool.
It drops an unused variable and the related imports, adds some blank
lines where the PEP8 formatting standard indicates they should be, and
does a comparison with None as "is None" instead of "== None".
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Russell Bryant [Tue, 17 Nov 2015 22:00:06 +0000 (14:00 -0800)]
ovn: Fix ACLs for child logical ports.
The physical input flows for child logical ports (for the
container-in-a-VM use case, for example) did not set a conntrack zone
ID. The previous code only allocated a zone ID for local VIFs and
missed doing it for child ports.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
This bug fix is not required for OVS use cases. But is it
nice to keep function consistent with upstream implementation.
Upstream commit:
Earlier patch 6ae459bda tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb->data.
Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.
Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 31b33dfb0a1 ("skbuff: Fix skb checksum partial check"); Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Thu, 10 Dec 2015 22:19:56 +0000 (14:19 -0800)]
datapath: Fix STT packet receive handling.
STT reassembly can generate list of packets. But it was
handled as a single skb. Following patch fixes it.
Fixes: e23775f20 ("datapath: Add support for lwtunnel"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org> Acked-by: Joe Stringer <joe@ovn.org>
odp-util: Correctly [de]serialize mask for ND attributes.
When converting between ODP attributes and struct flow_wildcards, we
check that all the prerequisites are exact matched on the mask.
For ND(ICMPv6) attributes, an exact match on tp_src and tp_dst
(which in this context are the icmp type and code) shold look like
htons(0xff), not htons(0xffff). Fix this in two places.
The consequences were that the ODP mask wouldn't include the ND
attributes and the flow would be deleted by the revalidation.
odp-util: Return exact mask if netlink mask attribute is missing.
In the ODP context an empty mask netlink attribute usually means that
the flow should be an exact match.
odp_flow_key_to_mask{,_udpif}() instead return a struct flow_wildcards
with matches only on recirc_id and vlan_tci.
A more appropriate behavior is to handle a missing (zero length) netlink
mask specially (like we do in userspace and Linux datapath) and create
an exact match flow_wildcards from the original flow.
This fixes a bug in revalidate_ukey(): every flow created with
megaflows disabled would be revalidated away, because the mask would
seem too generic. (Another possible fix would be to handle the special
case of a missing mask in revalidate_ukey(), but this seems a more
generic solution).
commit_set_icmp_action() should do its job only if the packet is ICMP,
otherwise there will be two problems:
* A set ICMP action will be inserted in the ODP actions and the flow
will be slow pathed.
* The tp_src and tp_dst field will be unwildcarded.
Normal TCP or UDP packets won't be impacted, because
commit_set_icmp_action() is called after commit_set_port_action() and it
will see the fields as already committed (TCP/UCP transport ports and ICMP
code/type are stored in the same members in struct flow).
MPLS packets though will hit the bug, causing a nonsensical set action
(which will end up zeroing the transport source port) and an invalid
mask to be generated.
The commit also alters an MPLS testcase to trigger the bug.
tnl-ports: Generate mask with correct prerequisites.
We should match on the transport ports only if the tunnel has a UDP
header. It doesn't make sense to match on transport port for GRE
tunnels.
Also, to match on fragment bits we should use FLOW_NW_FRAG_MASK instead
of 0xFF. FLOW_NW_FRAG_MASK is what we get if we convert to the ODP
netlink format and back.
Adding the correct masks in the tunnel router classifier helps in making
sure that the translation generates masks that respect prerequisites.
If the mask has some fields that do not respect prerequisites, the flow
will get deleted by revalidation, because translating to ODP format and
back will generate a more generic mask, which will be perceived as too
generic (compared with the one generated by the translation).
ofproto-dpif-xlate: Fix revalidation in execute_controller_action().
If there's no actual packet (e.g. during revalidation),
execute_controller_action() exits right away, without calling
xlate_commit_actions().
xlate_commit_actions() might have an influence on slow_path reason
(which is included in the generated ODP actions), meaning that the
revalidation will not generate the same actions than the original
translation.
Fix the problem by making execute_controller_action() call
xlate_commit_actions() even without a packet.
Joe Stringer [Wed, 9 Dec 2015 00:14:06 +0000 (16:14 -0800)]
datapath: Respect conntrack zone even if invalid.
If userspace executes ct(zone=1), and the connection tracker determines
that the packet is invalid, then the ct_zone flow key field is populated
with the default zone rather than the zone that was specified. Even
though connection tracking failed, this field should be updated with the
value that userspace specified. Fix the issue.
Fixes: a94ebc39996b ("datapath: Add conntrack action") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Russell Bryant [Tue, 8 Dec 2015 22:32:47 +0000 (17:32 -0500)]
ovn: Fix ct_state bit mappings in OVN symtab.
The OVN symbol table contained outdated mappings between connection
states and the corresponding bit in the ct_state field. This patch
updates the symbol table with the proper values as defined in
lib/packets.h.
Signed-off-by: Russell Bryant <russell@ovn.org> Fixes: 63bc9fb1c69f ("packets: Reorder CS_* flags to remove gap.") Acked-by: Joe Stringer <joe@ovn.org>
Nithin Raju [Wed, 25 Nov 2015 20:32:33 +0000 (12:32 -0800)]
datapath-windows: Don't assert for unknown actions
On Hyper-V, we currently don't validate a flow to see if datapath can
indeed execute all the actions specified or not. While support for it
gets implemented, an ASSERT seems too strong. I'm working on the support
for actions validation. Here's a workaround in the meantime to help
debugging.
Pravin B Shelar [Tue, 8 Dec 2015 02:23:21 +0000 (18:23 -0800)]
datapath: Backport: vxlan: interpret IP headers for ECN correctly
Upstream commit:
When looking for outer IP header, use the actual socket address family, not
the address family of the default destination which is not set for metadata
based interfaces (and doesn't have to match the address family of the
received packet even if it was set).
Fix also the misleading comment.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: ce212d0f6f5 ("vxlan: interpret IP headers for ECN correctly") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Tue, 8 Dec 2015 02:23:20 +0000 (18:23 -0800)]
datapath: Backport: vxlan: fix incorrect RCO bit in VXLAN header
Upstream commit:
Commit 3511494ce2f3d ("vxlan: Group Policy extension") changed definition of
VXLAN_HF_RCO from 0x00200000 to BIT(24). This is obviously incorrect. It's
also in violation with the RFC draft.
Fixes: 3511494ce2f3d ("vxlan: Group Policy extension") Cc: Thomas Graf <tgraf@suug.ch> Cc: Tom Herbert <therbert@google.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: c5fb8caaf91 ("vxlan: fix incorrect RCO bit in VXLAN header") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
After 614732eaa12d, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.
v1 -> v2:
- move vport 'owner' initialization in ovs_vport_ops_register()
and make such function a macro
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 83e4bf7a74 ("openvswitch: properly refcount vport-vxlan
module"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Tue, 8 Dec 2015 02:23:18 +0000 (18:23 -0800)]
datapath: Backport: openvswitch: fix hangup on vxlan/gre/geneve device deletion
Upstream commit:
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream: 131753030("openvswitch: fix hangup on vxlan/gre/geneve device
deletion"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@kernel.org>
Ilya Maximets [Mon, 7 Dec 2015 10:02:41 +0000 (13:02 +0300)]
ofproto-dpif: add reply on error in ofproto/tnl-push-pop
Fixes hang of 'ovs-appctl ofproto/tnl-push-pop' when an invalid
argument passed.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The NAT validation is similar (and based on) the existing conntrack
validation: when a dpif backer is created, we try to install a flow with
the ct_state NAT bits set. If the flow setup fails we assume that the
backer doesn't support NAT and we reject OpenFlow flows with a NAT
action or a match on the ct_state NAT bits.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Jarno Rajahalme [Fri, 4 Dec 2015 18:19:07 +0000 (10:19 -0800)]
bond: Use correct type for slave's change_seq.
seq values are 64-bit, and storing them to a 32-bit variable causes
the stored value never to match actual seq value after the seq value
gets big enough.
This is a likely cause of OVS main thread using 100% CPU in a system
using bonds after some runtime.
VMware-BZ: #1564993 Reported-by: Hiram Bayless <hbayless@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 4 Dec 2015 07:00:32 +0000 (23:00 -0800)]
ovs-ofctl: Fix manpage formatting typo.
Only the names of the fields were supposed to be bold here, but omitting
the "fR" from "\fR" made everything between the field names bold too,
which looked funny.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Based on IPv4 tests, test tunnels over IPv6. In order to do that, add
netdev-dummy/ip6addr command for dummy bridges, and get_in6 support for
netdev-dummy as well.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
tnl_arp_lookup is not used anymore. All users have been converted to
IPv4-mapped addresses. New users need to use IPv4-mapped addresses and use
tnl_neigh_lookup.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Fri, 4 Dec 2015 01:11:49 +0000 (17:11 -0800)]
ovn-northd: Only run idl loop if something changed.
Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
would use a sequence to see if anything changed in NB database to
compute and notify the SB database, and vice versa. This logic got
dropped with the refactor, causing a testsuite failure in the ovn-sbctl
test. Reintroduce the IDL sequence number checking.
Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
functions") Suggested-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Justin Pettit <jpettit@ovn.org> Tested-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Thu, 3 Dec 2015 07:53:56 +0000 (23:53 -0800)]
FAQ: Document kernel feature support.
Some recent features have more stringent requirements for kernel
versions than the FAQ describes. Add an entry to be more explicit on
which features work with which versions of the upstream kernel.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 3 Dec 2015 07:53:55 +0000 (23:53 -0800)]
datapath: Scrub skb between namespaces
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.
This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.
Upstream: 740dbc289155 ("openvswitch: Scrub skb between namespaces") Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 3 Dec 2015 07:53:53 +0000 (23:53 -0800)]
datapath: Allow attaching helpers to ct action
Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper
to be used for this connection. If no helper is specified, then helpers
will be automatically applied as per the sysctl configuration of
net.netfilter.nf_conntrack_helper.
The helper may be specified as part of the conntrack action, eg:
ct(helper=ftp). Initial packets for related connections should be
committed to allow later packets for the flow to be considered
established.
Example ovs-ofctl flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(recirc)
in_port=2,tcp,ct_state=+trk-new+est,action=1
in_port=2,tcp,ct_state=+trk+rel,action=1
Upstream: cae3a26 "openvswitch: Allow attaching helpers to ct action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 3 Dec 2015 07:53:52 +0000 (23:53 -0800)]
datapath: Allow matching on conntrack label
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.
Upstream: c2ac667 "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 3 Dec 2015 07:53:51 +0000 (23:53 -0800)]
datapath: Allow matching on conntrack mark
Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.
Upstream: 182e304 "openvswitch: Allow matching on conntrack mark" Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 3 Dec 2015 07:53:50 +0000 (23:53 -0800)]
datapath: Add conntrack action
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.
Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.
When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.
When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.
IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.
IP frag handling contributed by Andy Zhou.
Based on original design by Justin Pettit.
Upstream: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Thu, 3 Dec 2015 07:53:49 +0000 (23:53 -0800)]
datapath: Serialize acts with original netlink len
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.
Upstream: 8e2fed1 "openvswitch: Serialize acts with original netlink len" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>