Joe Stringer [Mon, 20 Mar 2017 21:08:19 +0000 (14:08 -0700)]
ofproto-dpif-upcall: Fix flow setup/delete race.
If a handler thread takes a long time to set up a set of flows, it is
possible for one of the installed flows to be dumped and scheduled
for deletion by a revalidator thread before the handler is able to
transition the ukey into an operational state---Between the
dpif_operate() above this function and the ukey lock / transition logic
modified by this patch.
Only transition the ukey for the flow if it wasn't already transitioned
to a later state by a revalidator thread.
Fixes: 54ebeff4c03d ("upcall: Track ukey states.") Reported-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Paul Blakey <paulb@mellanox.com>
Lance Richardson [Thu, 23 Mar 2017 16:23:33 +0000 (12:23 -0400)]
sandbox: use ssl for ovn-controller to sb db connection
When SSL support is available, use SSL for the ovn-controller
to southbound database connection. When configured without
SSL, unix socket connections are used.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Russell Bryant <russell@ovn.org>
The difference between machines may cause the test to fail.
More importantly, when topology is changed or the root brdige
receives the TCN BPDU, the root bridge will start the topology
change timer. We should wait the topology change timer to stop
after 35s (max age 20 + forward delay 15). After 35s, the root
bridge will stop send CONF BPDU with STP_CONFIG_TOPOLOGY_CHANGE
flag and the topology will be stable. During this time, we should
make time warp (in a second) because the hold timer of stp ports
will stop after 1s. Then the root bridge can send quickly topology
change ack (other bridges may send TCN BPDU to root bridge) for
avoiding root brdige to flush fdb and mdb frequently.
This patch has been tested on centos 7.2 (kernel 3.10.0, python
2.7.5 and gcc 4.8.5), ubuntu 16.04 (kernel 4.4.0, python 3.5.2
and gcc 5.4.0) and ubuntu 16.04 (kernel 4.10.4, python 3.5.2 and
gcc 5.4.0). This patch has been tested for 3 hours. This patch
may make the stp tests more stable.
[Committer notes]
Folded time/warp execution into a for loop.
Fixes: 427e9751f300 ("tests: Add and improve stp tests.")
Reported-at: http://paste.ubuntu.com/24215426
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-March/330032.html Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Jarno Rajahalme [Fri, 24 Mar 2017 18:47:15 +0000 (11:47 -0700)]
meta-flow: Remove metadata prerequisite on ether type.
Conntrack original direction tuple fields depend on the conntrack
state and the type of the packet that was tracked. These dependencies
were encoded as OpenFlow prerequisites in commit daf4d3c18da4 ("odp:
Support conntrack orig tuple key."). However, having a prerequisite
from a metadata field to a packet header turned out to be problematic,
since sometimes we are decoding metadata fields alone, so that the
packet type field is not available.
The reason for the packet type dependency is that the IP addresses in
the original direction tuple can be either IPv4 or IPv6 addresses, and
it would be invalid to match on IPv4 original direction tuple
addresses for an IPv6 packet and vica verca. Upon closer look,
however, allowing this kind of mismatched match only causes the flow
to never match anything, rather than causing more severe problems.
This patch removes the formal prerequisite on the packet type, but
replaces that with an explicit check for the mismatch on flow install.
This way we can still return an error to the controller if it tries to
install a mismatched flow.
Justin Pettit [Wed, 22 Mar 2017 23:45:41 +0000 (16:45 -0700)]
ofproto-dpif-xlate: Include controller traffic for NetFlow.
The code previously did not include packets forwarded to the controller
in NetFlow, as it considered this control traffic. That is debatable for
deployments where the first packet of every flow is sent to the
controller for a forwarding decision that may eventually be executed on
the switch.
However, we are starting to send more traffic to local controllers for
non-forwarding purposes such as logging. These packets are already
being forwarded (and only copies are being sent to the controller), so
not accounting for them will incorrectly under-report NetFlow
statistics.
Jarno Rajahalme [Wed, 22 Feb 2017 02:17:04 +0000 (18:17 -0800)]
mirror: Allow concurrent lookups.
Handler threads use a selection of mirror functions with the
assumption that the data referred to is RCU protected, while the
implementation has not provided for this, which can lead to an OVS
crash.
This patch fixes this by making the mbundle lookup RCU-safe by using
cmap instead of hmap and postponing mbundle memory free, as wells as
postponing the frees of the mirrors and the vlan bitmaps of each
mirror.
Note that mirror stats update is still not accurate if multiple
threads do it simultaneously.
A less complete version of this patch (using cmap and RCU postpone
just for the mbridge itself) was tested by Yunjian Wang and was found
to fix the observed crash when running a script that adds and deletes
a port repeatedly.
Reported-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Sairam Venugopal [Tue, 21 Mar 2017 07:02:02 +0000 (00:02 -0700)]
datapath-windows: Add support for OVS_CT_ATTR_FORCE_COMMIT
Add support for handling OVS_CT_ATTR_FORCE_COMMIT in Conntrack action.
When this flag is specified, it implicitly means commit and deletes
entries in the reverse direction.
Joe Stringer [Fri, 17 Mar 2017 18:38:34 +0000 (11:38 -0700)]
ofproto: Log when learn limit reached.
This commit provides more visibility into conditions where learn limits
are reached when the functionality from patch 4c71600d2256
("ofp-actions: Add limit to learn action.") is used.
VMWare-BZ: #1832142 Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
The stp/show command will help users and developers to
get more details about stp. This patch works together with
the previous patch "stp: Change the api for next patch."
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch changes the stp_port_get_role and removes
the stp_port_get_id, because stp/show has locked the
mutex before calling the stp_port_get_role, and
stp_port_get_id will not be used.
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Russell Bryant [Thu, 16 Mar 2017 17:00:30 +0000 (13:00 -0400)]
build: Only re-gen HTML docs when needed.
When sphinx-build is installed, the docs were being re-generated during
every invocation of "make". This patch sets up dependencies such that
sphinx-build will only be executed if one of the documentation files has
changed.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 17 Mar 2017 20:38:55 +0000 (13:38 -0700)]
Fix format specifier technicalities.
Various printf() format specifiers in the tree had minor technical issues
which the Mac OS build reported, e.g. here:
https://s3.amazonaws.com/archive.travis-ci.org/jobs/208718342/log.txt
These tend to fall into two categories of harmless warnings:
1. Wrong width for types that are all promoted to 'int'. For example,
both uint8_t and uint16_t are both promoted to 'int' as part of a call
to printf(), but using PRIu8 for a uint16_t causes a warning.
2. Wrong format specifier for type promoted to 'int' due to arithmetic.
For example, if 'x' is a uint8_t, then x >> 1 has type 'int' due to
C's promotion rules, so the correct format specifier is %d and using
PRIu8 will cause a warning.
This commit fixes the warnings. I didn't see anything that rose to the
level of a bug.
These warnings only showed up on Mac OS X because of differences in the
format specifiers that Mac OS uses for PRI*.
Reported-by: Shu Shen <shu.shen@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Fri, 10 Mar 2017 09:25:52 +0000 (12:25 +0300)]
Documentation: Remove external dependence on pygments.
Current documentation uses syntax highlighting in 'sphinx'
via 'pygments' library. This leads to build failures on the
systems with old version of this library.
In fact that only 'windows.rst' uses highlighting it's a
very simple change. This helps us to avoid build issues
on different systems and allows to remove painful external
dependency.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Guoshuai Li [Thu, 9 Mar 2017 02:53:37 +0000 (10:53 +0800)]
ovn: Modify the DHCPv4 router option to optional
Co-authored-by: Dong Jun <dongj@dtdream.com> Signed-off-by: Dong Jun <dongj@dtdream.com> Signed-off-by: Guoshuai Li <ligs@dtdream.com> Acked-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Eric Garver [Wed, 1 Mar 2017 22:48:00 +0000 (17:48 -0500)]
Add new port VLAN mode "dot1q-tunnel"
- Example:
ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
Pushes another VLAN 100 header on packets (tagged and untagged) on
ingress, and pops it on egress.
- Customer VLAN check:
ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
Only customer VLAN of 10 and 20 are allowed.
Co-authored-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 16 Mar 2017 21:04:41 +0000 (14:04 -0700)]
compiler: Use C11 build assertions with new enough GCC or Clang.
Until now, the BUILD_ASSERT and BUILD_ASSERT_DECL macros have used OVS's
home-grown build assertion strategy. This commit switches them to using
C11 build assertions with compilers that support them. The semantics are
the same, but C11 build assertions yield clearer error messages when they
fail.
This commit also reorders the definitions a bit to make it easier to
follow.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Eric Garver [Wed, 1 Mar 2017 22:47:59 +0000 (17:47 -0500)]
Add support for 802.1ad (QinQ tunneling)
Flow key handling changes:
- Add VLAN header array in struct flow, to record multiple 802.1q VLAN
headers.
- Add dpif multi-VLAN capability probing. If datapath supports
multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
Refactor VLAN handling in dpif-xlate:
- Introduce 'xvlan' to track VLAN stack during flow processing.
- Input and output VLAN translation according to the xbundle type.
Push VLAN action support:
- Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
- Support push_vlan on dot1q packets.
Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs
that can be matched. This allows us to preserve backwards compatibility.
Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN
handling
Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Co-authored-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit adds a new feature to the learn actions: the possibility to
limit the number of learned flows.
To be compatible with users of the old learn action, a new structure is
introduced as well as a new OpenFlow raw action number.
There's a small corner case when we have to delete the ukey. This
happens when:
* The learned rule has expired (or has been deleted).
* The ukey that learned the rule is still in the datapath.
* No packets hit the datapath flow recently.
In this case we cannot relearn the rule (because there are no new
packets), and the actions might depend on the learn execution, so the
only option is to delete the ukey. I don't think this has big
performance implications since it's done only for ukey with no traffic.
We could also slowpath it, but that will cause an action upcall and the
correct datapath actions will be installed later by a revalidator. If
we delete the ukey, the next upcall will be a miss upcall and that will
immediatedly install the correct datapath flow.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Eric Garver [Thu, 16 Mar 2017 14:22:32 +0000 (10:22 -0400)]
checkpatch.py: Fix false positive on if/when/for
We need to use == instead of the is operator. If you're unlucky it may
fail because they're not exactly the same object, but hold the same
value.
Example false positive:
E(120): Inappropriate bracing around statement
+ if (0 != nl_attr_get_u8(vxlan[IFLA_VXLAN_LEARNING])
Fixes: 30c7ffd5ac46 ("utilities/checkpatch.py: Check for appropriate bracing") Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Russell Bryant <russell@ovn.org>
Yi-Hung Wei [Mon, 13 Mar 2017 18:28:22 +0000 (11:28 -0700)]
ofproto: Move tun_table and vl_mff_map deletion.
In this patch, we move the tun_table and vl_mff_map deletion in
ofproto_destory__() to be in the following order.
1. Delete all the flows.
2. Delete vl_mff_map.
3. Delete tun_table.
The rationale behind this order is that a flow may use a variable length
mf_field, and a variable length mf_field is defined by a TLV mapping
in tun_table.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Yi-Hung Wei [Mon, 13 Mar 2017 18:28:21 +0000 (11:28 -0700)]
ofproto: Add ref counting for variable length mf_fields.
Currently, a controller may potentially trigger a segmentation fault if it
accidentally removes a TLV mapping that is still used by an active flow.
To resolve this issue, in this patch, we maintain reference counting for each
dynamically allocated variable length mf_fields, so that vswitchd can use this
information to properly remove a TLV mapping, and to return an error if the
controller tries to remove a TLV mapping that is still used by any active flow.
To keep track of the usage of tun_metadata for each flow, two 'uint64_t'
bitmaps are introduce for the flow match and flow action respectively. We use
'uint64_t' as a bitmap since the 64 geneve TLV tunnel metadata are the only
available variable length mf_fields for now. We shall adopt general bitmap when
more variable length mf_fields are introduced. The bitmaps are configured
during the flow decoding process, and vswitchd use these bitmaps to increase or
decrease the ref counting when the flow is created or deleted.
Yi-Hung Wei [Mon, 13 Mar 2017 18:28:20 +0000 (11:28 -0700)]
nx-match: Use vl_mff_map to parse match field.
vl_mff_map is introduced in commit 04f48a68c428 ("ofp-actions: Fix variable
length meta-flow OXMs") to account variable length mf_field, and it is used
to decode variable length mf_field in ofp_action. In this patch, vl_mff_map
is further used to decode the variable length match field as well.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Yi-Hung Wei [Mon, 13 Mar 2017 18:27:49 +0000 (11:27 -0700)]
nx-match: Fix oxm decode.
decode_nx_packet_in2() may be used by the switch to parse NXT_RESUME messages,
where we need exact match on the oxm header. Therefore, change
oxm_decode_loose() to oxm_decode() that takes an extra argument to indicate whether
we want strict or loose match.
Fixes: 7befb20d0f70 ("ofp-util: Ignore unknown fields in ofputil_decode_packet_in2()") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
Russell Bryant [Mon, 13 Mar 2017 20:26:00 +0000 (16:26 -0400)]
Document OVN support in ovs-sandbox.
A previous commit removed the original ovs-sandbox based OVN tutorial
because it became too outdated and difficult to maintain. However,
the use of ovs-sandbox for basic OVN development and testing is incredibly
useful, so we should provide at least basic documentation on how to use it.
This commit introduces a new and shorter document that shows how to use OVN
in ovs-sandbox. It provides a single sample configuration, as well as a
sample ovn-trace command.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Numan Siddique <nusiddiq@redhat.com>
Andy Zhou [Thu, 9 Mar 2017 22:00:34 +0000 (14:00 -0800)]
ofproto-dpif-xlate: Avoid using sample action when nesting level is low
When datapath sample action only allow a small number of nested actions
(i.e. less than 3), do not translate the OpenFlow's 'clone' action
into datapath 'sample' action, since such translation would cause
datapath to reject the flow, with 'EOVERFLOW', when OVS is used to
implement the OVN pipeline, or more generally, when deeper nested
clone are expected.
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-March/329586.html Signed-off-by: Andy Zhou <azhou@ovn.org> Tested-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Joe Stringer <joe@ovn.org>
Ian Stokes [Fri, 10 Mar 2017 11:47:09 +0000 (11:47 +0000)]
docs: Use DPDK 16.11.1 stable release.
DPDK now provides a stable release branch. Modify dpdk docs and travis linux
build script to use the DPDK 16.11.1 stable branch to benefit from most
recent bug fixes.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
There are multiple reasons why a interface can exist
in the Open vSwitch database but not exist in the system.
For e.g, a restart of a host after a system crash. Ideally,
whoever added the interface in the Open vSwitch database
should remove those interfaces. But that usually does not
happen in practise. Based on experience, I have observerd
that on any long lasting OVS installation there are always
a couple of stale interfaces.
When a stale interface remains in the Open vSwitch database
and the container/VM initially backing that stale interface
is moved to a different machine, the two ovn-controllers
start over-writing the OVN-SB's port_binding table in a loop.
This situation can be avoided, if ovn-controller only binds
the interfaces that actually have a valid 'ofport'.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Russell Bryant <russell@ovn.org>
Alin Serdean [Wed, 8 Mar 2017 14:31:56 +0000 (14:31 +0000)]
tests: Fix mcast test on slow systems
On slow systems(or which start processes slow) the test:
`testing mcast - delete the port mdb when port destroyed`
is influenced by the running time.
i.e.: http://64.119.130.115/ovs/911b7e9b08b9f4f890eeecd228d5124f4ce94d4e/testsuite.dir/2326/testsuite.log.gz
This patches adds a time stop on vswitchd.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 6 Mar 2017 06:49:11 +0000 (09:49 +0300)]
dpdk: Redirect DPDK log to OVS logging subsystem.
This should be helpful for have all the logs in one place.
'ovs-appctl vlog' commands for 'dpdk' module can be used
to configure the log level. Lower bound for DPDK logging
(--log-level) still can be passed through 'dpdk-extra' field.
Ian Stokes [Thu, 9 Mar 2017 13:57:37 +0000 (13:57 +0000)]
netdev-dpdk: Fix mempool segfault.
The dpdk_mp_get() function can return a NULL pointer which leads to a
segfault when a mempool cannot be created. The lack of a return value
check for the function netdev_dpdk_mempool_configure() when called in
netdev_dpdk_reconfigure() can result in a segfault also as
a NULL pointer for the mempool will be passed to rte_eth_rx_queue_setup().
Fix this by adding appropriate NULL pointer and return value checks to
dpdk_mp_get(), netdev_dpdk_reconfigure() and dpdk_vhost_reconfigure_helper().
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Fixes: 2ae3d542 ("netdev-dpdk: Refactor dpdk_mp_get().") Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") CC: Daniele Di Proietto <diproiettod@vmware.com> CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Jarno Rajahalme [Thu, 9 Mar 2017 22:09:08 +0000 (14:09 -0800)]
lib: Indicate if netlink message had labels.
Conntrack update events include labels only if they have changed.
Record the presence of labels in the netlink message to OVS internal
representation, so that the user may keep the old labels when an
update does not modify them.
Fixes: 6830a0c0e6bf ("netlink-conntrack: New module.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Mika Vaisanen [Tue, 7 Mar 2017 18:15:55 +0000 (10:15 -0800)]
ofproto-dpif-xlate: Allow sending BFD messages when STP port is not forwarding.
Interworking of BFD and RSTP does not work, as currently BFD messages
are dropped if RSTP port is not in forwarding mode. To correct this
problem, an extra check is added to allow BFD messages to be sent even
when rstp_forward_state is false.
[Committer notes]
Shifted logic checks out into a separate else if {} condition, extended
to CFM and added CFM test case.
Signed-off-by: Mika Vaisanen <mika.vaisanen@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Expose existing netdev stats via sFlow.
Export sFlow ETHERNET structure with available counters.
Map existing stats to counters in the GENERIC INTERFACE
sFlow structure.
Adjust unit test to accommodate these new counters.
Signed-off-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Acked-by: Neil McKee <neil.mckee@inmon.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
xurong00037997 [Fri, 24 Feb 2017 02:03:26 +0000 (10:03 +0800)]
Adapt to flake8-import-order
https://review.openstack.org/#/c/432906/
flake8-import-order adds 3 new flake8 warnings:
I100: Your import statements are in the wrong order.
I101: The names in your from import are in the wrong order.
I201: Missing newline between sections or imports.
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.
Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched. If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.
This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry. If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.
Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de657f8, 'timers: Switch to
a non-cascading wheel').
Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.
During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.
The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.
Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.
This is the standard/expected way when conntrack entries are destroyed.
Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Upstream commit f330a7fdbe16 ("netfilter: conntrack: get rid of
conntrack timer") changes the way nf_ct_delete() is called. Prior to
commit the call pattern was like this:
if (del_timer(&ct->timeout))
nf_ct_delete(ct, ...);
After this change nf_ct_delete() is called directly:
nf_ct_delete(ct, ...);
This patch provides a replacement implementation for nf_ct_delete()
that first calls the del_timer(). This replacement is only used if
the struct nf_conn has member 'timeout' of type 'struct timer_list'.
The following patch introduces the first caller to nf_ct_delete() in
the OVS kernel module.
Linux <3.12 does not have nf_ct_delete() at all, so we inline it if it
does not exist. The inlined code is from 3.11 death_by_timeout(),
which in later versions simply calls nf_ct_delete().
Jarno Rajahalme [Thu, 9 Mar 2017 01:18:23 +0000 (17:18 -0800)]
actions: Add resubmit with conntrack tuple.
Add resubmit option to use the conntrack original direction tuple
swapped with the corresponding packet header fields during the lookup.
This could allow the same ACL table be used for admitting return
and/or related traffic as is used for admitting the original direction
traffic.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Jarno Rajahalme [Thu, 9 Mar 2017 01:18:23 +0000 (17:18 -0800)]
ofp-util: Ignore unknown fields in ofputil_decode_packet_in2().
The decoder of packet_in messages should not fail on encountering
unknown metadata fields. This allows the switch to add new features
without breaking controllers. The controllers should, however, copy
the metadata fields from the packet_int to packet_out so that the
switch gets back the full metadata. OVN is already doing this.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch squashes in minimal amount of OVS userspace code to not
break the build. Later patches contain the full userspace support.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Jarno Rajahalme [Thu, 9 Mar 2017 01:18:22 +0000 (17:18 -0800)]
lib: Check match and action prerequisities with 'match'.
Supply the match mask to prerequisities checking when available. This
allows checking for zero-valued matches. Non-zero valued matches
imply the presense of corresponding mask bits, but for zero valued
matches we must explicitly check the mask, too.
This is required now only for conntrack validity checking due to the
conntrack state having and 'invalid' bit, but not 'valid' bit. One
way to match an valid conntrack state is to match on the 'tracked' bit
being one and 'invalid' bit being zero. The latter requires the
corresponding mask bit be verified.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection. Instead, we do a lookup in the
expectation table directly. However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this. This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.
This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough. Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: a94ebc39996b ("datapath: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Jarno Rajahalme [Thu, 9 Mar 2017 01:18:22 +0000 (17:18 -0800)]
datapath: Refactor labels initialization.
Upstream commit:
Refactoring conntrack labels initialization makes changes in later
patches easier to review.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time. This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build. In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.
Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Unionize ovs_key_ct_label with a u32 array.
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array. It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has
been received can be confusing. Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.
Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark") Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Upstream commit:
openvswitch: Set event bit after initializing labels.
Connlabels are included in conntrack netlink event messages only if
the IPCT_LABEL bit is set in the event cache (see
ctnetlink_conntrack_event()). Set it after initializing labels for a
new connection.
Found upon further system testing, where it was noticed that labels
were missing from the conntrack events.
Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed con
nections.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: 372ce9737d2b ("datapath: Allow matching on conntrack mark") Fixes: 038e34abaa31 ("datapath: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry. Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made. Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.
This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.
Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch also adds a test case to OVS system tests to verify the
behavior.
The following is a more thorough explanation of what is going on:
When we have evidence that an existing conntrack entry could exist, we
must invert the tuple if NAT has already been applied, as the current
packet headers do not match any tuple stored in conntrack. For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:
Original direction tuple: (X,B)
Reply direction tuple: (B,A)
Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:
Current packet tuple: (B,X)
This does not match either of the conntrack tuples above. Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost. When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again. Prior to this patch this
second lookup failed. The datapath flow setup corresponding to the
upcall can succeed, however, allowing all further packets in the reply
direction to re-use the conntrack entry pointer in the skb, so
typically the lookup failure only causes a packet drop.
The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:
Inverted tuple: (X,B)
With this the conntrack entry can be found, matching the original
direction tuple.
This same logic also works for the original direction packets:
Current packet tuple (after reverse NAT): (A,B)
Inverted tuple: (B,A)
While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.
Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.
Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Ilya Maximets [Tue, 21 Feb 2017 14:49:25 +0000 (17:49 +0300)]
id-pool: Allocate the lowest available ids.
This simple change makes id-pool to always allocate the
lowest possible id from the pool. No any other code affected
because, actually, there is no users of 'id_pool_free_id' in
OVS.
This behaviour of id-pool will be used in the next patch.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Eric Garver [Tue, 21 Feb 2017 19:22:53 +0000 (14:22 -0500)]
ofp-actions: Fix translation of set_field for nw_ecn
When using set_field for nw_ecn with OF1.0 or OF1.1, you get an error
instead of a proper translation. This use to work before 4b684612d900
("ofp-actions: Translate mod_nw_ecn action to OF1.1 properly.") because
it would fallback to using NXM.
$ ovs-ofctl -O OpenFlow11 add-flow br0 'ip actions=set_field:2->nw_ecn'
ovs-ofctl: none of the usable flow formats (NXM,OXM) is among the
allowed flow formats (OpenFlow11)
Fixes: 4b684612d900 ("ofp-actions: Translate mod_nw_ecn action to OF1.1 properly.") Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Wed, 22 Feb 2017 19:59:41 +0000 (14:59 -0500)]
ovs-tcpdump: Set mirror port mtu
When using ovs-tcpdump to mirror interfaces with MTU larger than the default,
Open vSwitch will lower the interfaces we are interested in monitoring.
Instead, probe the MTU and set the mirrored port's MTU value correctly.
Fixes: 314ce6479a83 ("ovs-tcpdump: Add a tcpdump wrapper utility") Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Thu, 16 Feb 2017 08:47:32 +0000 (00:47 -0800)]
dpdk: Export packet_set_ipv6_addr() for DPDK.
The NAT changes in this series need both packet_set_ipv4_addr()
and packet_set_ipv6_addr() exporting, however, the ipv4 api was
exported with an unrelated patch.
Ben Pfaff [Thu, 26 Jan 2017 18:26:30 +0000 (10:26 -0800)]
ovs-fields.7: Use a more general approach to groff encodings.
It turns out that, since groff 1.20 around 2009, groff comes with a
preprocessor named "preconv" that can fix encoding issues. Use it instead
of the existing hack.