Ben Pfaff [Fri, 21 Sep 2018 18:25:55 +0000 (11:25 -0700)]
flow: Fix uninitialized flow fields in IPv6 error case.
When parse_ipv6_ext_hdrs__() returned false, half a 64-bit word had been
pushed into the miniflow and the second half was left uninitialized. This
commit fixes the problem.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10518 Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Yifeng Sun [Tue, 18 Sep 2018 21:39:57 +0000 (14:39 -0700)]
tests: Fix broken test of 'truncate and output to gre tunnel'
The test 'truncate and output to gre tunnel' is broken on certain kernels
where OVS kernel module and upstream GRE module can't co-exist. This
patch creates a test that doesn't depend on upstream GRE module but
provides the same testing.
The replaced test is skipped on problematic kernel versions.
On centos, this test may fail due to the default rules of iptables.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Currently, OVS does not update the flow stats after a packet is
restarted by NXT_RESUME message. This patch fixes the aforementioned
issue and adds an unit test to prevent regression.
Fixes: 77ab5fd2a95b ("Implement serializing the state of packet traversal in "continuations".")
VMware-BZ: #2198435 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 20 Sep 2018 18:01:35 +0000 (11:01 -0700)]
meta-flow: Make "nw_frag" a synonym for "ip_frag".
Since the time that OVS introduced support for IP fragments, the OVS
functions that format flows have used "nw_frag", but the ones that parse
flows have expected "ip_frag". Obviously this is a bug and it's a surprise
that it's gone so long without anyone reporting the problem. This fixes
it and adds a test.
Yifeng Sun [Tue, 18 Sep 2018 17:35:59 +0000 (10:35 -0700)]
gre: Rename fallback devices to avoid udev's interference
On certain kernel versions, when openvswitch kernel module creates
a gre0 interface, the kernel’s gre module will jump out and compete
to control the gre0 interface. This will cause the failure of
openvswitch kernel module loading.
This fix renames fallback devices by adding a prefix "ovs-".
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
VMware Issue: #2162866
ovsdb-server: Alleviate the possible data loss in an active/standby setup
The present code resets the database when it is in the state -
'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it
receives the monitor reply when it is in the state -
'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode
before it processes the monitor reply, the whole data is lost.
This patch alleviates the problem by resetting the database when it
receives the monitor reply (before processing it). So that
reset and repopulation of the db happens in the same state.
This approach still has a window for data loss if the function
process_notification() when processing the monitor reply fails for
some reason or ovsdb-server crashes for some reason during
process_notification().
Reported-by: Han Zhou <zhouhan@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html Tested-by: aginwala <aginwala@ebay.com> Acked-by: Han Zhou <zhouhan@gmail.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Mon, 10 Sep 2018 20:00:59 +0000 (13:00 -0700)]
ovsdb-idlc: Use ALIGNED_CAST to avoid spurious warnings for index rows.
The *_index_init_row() function casts a generic ovsdb_idl_row pointer to
a specific type of row pointer. This can cause an increase in required
alignment with some kinds of data on some architectures. GCC complains,
e.g.:
lib/vswitch-idl.c: In function 'ovsrec_flow_sample_collector_set_index_init_row'
lib/vswitch-idl.c:9277:12: warning: cast increases required alignment of target
However, rows are always allocated with malloc(), which returns member
suitable for any type, so this is a false positive warning and this commit
suppresses it.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Han Zhou <zhouhan@gmail.com>
Martin Xu [Wed, 12 Sep 2018 19:43:51 +0000 (12:43 -0700)]
rhel: Ship ovs shared libraries, fedora
This patch extends 4886d4d2495b (debian, rhel: Ship ovs shared libraries
and header files) to fedora, by packaging the shared libraries in
openvswitch and openvswitch-dvel RPM. These files are always packaged in
the RPMs built with rhel6 spec file.
When we originally added commit 2bdd1f3d96 it was part of an
effort to work around gre module conflicts found while enabling
the ERSPAN feature. Testing at the time did not show any benefit
so in commit a94f9524db we reverted it. However, further
developments showed that in some corner cases it did have a
benefit and it did not do any harm so we reverted the original
revert to restore the code.
Signed-off-by: Greg Rose <roseg@vmware.com> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
Kevin Traynor [Fri, 31 Aug 2018 08:47:55 +0000 (09:47 +0100)]
dpif-netdev: Add round-robin based rxq to pmd assignment.
Prior to OVS 2.9 automatic assignment of Rxqs to PMDs
(i.e. CPUs) was done by round-robin.
That was changed in OVS 2.9 to ordering the Rxqs based on
their measured processing cycles. This was to assign the
busiest Rxqs to different PMDs, improving aggregate
throughput.
For the most part the new scheme should be better, but
there could be situations where a user prefers a simple
round-robin scheme because Rxqs from a single port are
more likely to be spread across multiple PMDs, and/or
traffic is very bursty/unpredictable.
Add 'pmd-rxq-assign' config to allow a user to select
round-robin based assignment.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
ovs-save: Don't always include the default flow during restore
Currently the default flow (actions=NORMAL) is present in the flow table after
the flow table is restored also when the default flow is removed.
This commit changes the behaviour of the "ovs-save save-flows" command to use
"replace-flows" instead of "add-flows" to restore the flows. This is needed in
order to always have the new flow table as it was before restoring it.
Gavi Teitz [Fri, 10 Aug 2018 08:30:08 +0000 (11:30 +0300)]
dpctl: Expand the flow dump type filter
Added new types to the flow dump filter, and allowed multiple filter
types to be passed at once, as a comma separated list. The new types
added are:
* tc - specifies flows handled by the tc dp
* non-offloaded - specifies flows not offloaded to the HW
* all - specifies flows of all types
The type list is now fully parsed by the dpctl, and a new struct was
added to dpif which enables dpctl to define which types of dumps to
provide, rather than passing the type string and having dpif parse it.
Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Gavi Teitz [Fri, 10 Aug 2018 08:30:07 +0000 (11:30 +0300)]
dpif-netdev: Initialize dpif_flow attrs
In a previous commit, the dpif_flow struct was expanded, with the
'offloaded' field being moved into a new struct which also includes a
field for the dp layer the flow is handled on. The initialization of
these fields was only done in dpif-netlink.
This completes that commit, by initializing the fields in dpif-netdev
as well. As the 'offloaded' field was previously ignored by
dpif-netdev, the attrs are initialized to the default values of
'false' for the offloaded state, and 'ovs' for the dp layer.
Fixes: d63ca5329ff9 ("dpctl: Properly reflect a rule's offloaded to HW state") Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ben Pfaff [Thu, 6 Sep 2018 22:42:06 +0000 (15:42 -0700)]
odp-util: Don't attempt to write IPv6 flow label bits that don't exist.
The ipv6_label field member of struct ovs_key_ipv6 is 32 bits in size,
but an IPv6 label is only 20 bits, so the upper 12 bits are not writable
and must be 0 in the mask. The code wasn't careful about this so it could
try to write them anyway. This commit fixes the problem.
Reported-by: nm_r@directbox.com
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047357.html Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
lib/tc: treat vlan id and prio as independent fields
Previously the key was used to check the presence of vlan id and
prio fields instead of using the mask. Additionally the vlan id
field was considered to be present if only the prio field was set,
and vice versa. f.e. setting the following:
Resulted in (instead of wildcarding vlan_id, filter matches 0):
filter protocol 802.1Q pref 1 flower chain 0
filter protocol 802.1Q pref 1 flower chain 0 handle 0x1
vlan_id 0
vlan_prio 2
vlan_ethtype ip
eth_type ipv4
ip_flags nofrag
in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 2 ref 1 bind 1 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie 47040ae7a94fff6afd7ed8aa04b11ba4
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ben Pfaff [Thu, 30 Aug 2018 20:58:50 +0000 (13:58 -0700)]
tests: Add regression tests for all the bugs found by oss-fuzz so far.
This will make it harder for bugs found by oss-fuzz to reappear.
Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 30 Aug 2018 20:58:49 +0000 (13:58 -0700)]
ofp-port: Further cleanups and fixes for ofputil_decode_port_stats().
This fixes leaks on the error path in parse_intel_port_custom_property().
ofp_print_ofpst_port_reply() failed to free the custom_stats in decoded
port stats. This fixes the problem.
parse_intel_port_custom_property() had a memory leak if there was more than
one custom stats property (which there shouldn't be, but still). This
fixes the problem.
There was a function netdev_free_custom_stats_counters() meant for freeing
custom_stats, but hardly anything used it. This adopts it consistently.
It wasn't safe to free the custom stats if ofputil_decode_port_stats()
returned an error. Using netdev_free_custom_stats_counters() avoids this
pitfall.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9972 Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Louis Peens [Wed, 5 Sep 2018 13:26:24 +0000 (15:26 +0200)]
lib/tc: reject offloading of non-Ethernet packets
When a packet is marked with the special ethtype of OFP_DL_TYPE_NOT_ETH_TYPE
it got wrongly installed into tc datapath as a match on a packet with that
ethtype. This prevents that from happening.
Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ben Pfaff [Fri, 7 Sep 2018 17:39:18 +0000 (10:39 -0700)]
vswitch.xml: Better explain vlan-limit.
CC: Eric Garver <e@erig.me> Requested-by: Jerry Lilijun <jerry.lilijun@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eric Garver <e@erig.me>
This patch will make sure VXLAN tunnels with and without the group
based policy (GBP) option enabled can not coexist on the same
destination UDP port.
In theory, VXLAN tunnel with and without GBP enables can be
multiplexed on the same UDP port as long as different VNI's are
used. However currently OVS does not support this, hence this patch to
check for this condition.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
dhparams: Fix .c file generation with OpenSSL >= 1.1.1-pre9
Since OpenSSL upstream commit 201b305a2409
("apps/dsaparam.c generates code that is intended to be pasted or included into
an existing source file: the function is static, and the code doesn't include
dsa.h. Match the generated C source style of dsaparam.") "openssl dhparam -C"
generates the get_dh functions as static, but the functions are used inside
stream-ssl.c and so the static keyword cannot be used.
This commit removes the static keyword from the get_dh functions during
dhparams.c file generation by restoring the current behaviour.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Add TC offload support for classifying single MPLS tagged traffic.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Or Gerlitz [Thu, 6 Sep 2018 10:52:26 +0000 (13:52 +0300)]
lib/tc: Avoid matching on tunnel ttl or tos if not needed
The tunnel ttl key is not masked when provided to the tc lib, hence we
wrongly attempted to match on it, when we got non zero ttl key with a zero
mask. Fix it by applying the mask. Use the same practice for the tunnel tos.
Fixes: dd83253e117c ('lib/tc: Support matching on ip tunnel tos and ttl') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
ovs-ctl: Allow add-remote without vswitchd started.
'add_managers ()' is filtering add-remote if vswitchd is not started.
However, if we actually filter here we end up with a bricked system,
blackholing all traffic. Allowing add_manager() to proceed may mean
extra churn in controllers in some cases, but this is far better than
the alternative of a bricked system.
Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
system-traffic: Fix conntrack per zone limit test.
Commit 3f1087c added a per zone limit test which relied on the
CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE m4 macro to skip the test when executing
in a userspace datapath (since the per zone limit feature is not yet
implemented in userspace). That macro, however, has been removed in
commit 271e48a ("conntrack: Support conntrack flush by ct 5-tuple")
which was causing the test to fail when executing in userspace.
Instead, a new m4 macro, CHECK_CT_DPIF_PER_ZONE_LIMIT, is introduced to
make the same differentiation, until userspace doesn't support the per
zone limit.
CC: Yi-Hung Wei <yihung.wei@gmail.com> Fixes: 3f1087c ("system-traffic: Add conntrack per zone limit test case") Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Mark Michelson [Thu, 6 Sep 2018 20:01:38 +0000 (16:01 -0400)]
ovn: Detect and prevent duplicate address assignments.
This patch alters the 'ovn-nbctl lsp-set-addresses' command to check if
the IP addresses being added are duplicates of already-set IP addresses.
Test cases have been added for this detection.
This patch also adds a warning message to ovn-northd if duplicate IPv4
addresses are detected on a switch.
Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
erspan: set erspan_ver to 1 by default when adding an erspan dev
After erspan_ver is introudced, if erspan_ver is not set in iproute, its
value will be left 0 by default. Since Commit 02f99df1875c ("erspan: fix
invalid erspan version."), it has broken the traffic due to the version
check in erspan_xmit if users are not aware of 'erspan_ver' param, like
using an old version of iproute.
To fix this compatibility problem, it sets erspan_ver to 1 by default
when adding an erspan dev in erspan_setup. Note that we can't do it in
ipgre_netlink_parms, as this function is also used by ipgre_changelink.
Fixes: 02f99df1875c ("erspan: fix invalid erspan version.") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: 5e720da59d ("erspan: fix invalid erspan version.") Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn.at: Skip ACL rate-limiting test on slow/overloaded systems.
In ACL rate-limiting test, we send three sets of 100 packets. One of
the sets drops packets at a rate of 10 per second, one at a rate of 5
per second, and one not at all. On my setup, it takes roughly 0.67
seconds to send those 300 packets, but we have reports of it taking over
15 seconds on others. The test was intended to allow some flexibility
in run-time, but it's very difficult to design a mechanism that can all
possibilities.
To prevent false test failures, this patch changes the test to check
the duration count of the meter, and if it's greater than nine seconds,
just skip the test.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Reported-by: Thomas Goirand <zigo@debian.org>
Ben Pfaff [Thu, 6 Sep 2018 17:44:47 +0000 (10:44 -0700)]
daemon-unix: Use same name for original or restarted children.
Linux has an idea of process name that is visible in /proc/$pid/comm. This
is "ovs-vswitchd" for a freshly started ovs-vswitchd process. When the
monitor code restarted a crash child, it changed it to the empty string.
This confused the daemon_is_running check in ovs-lib.in, which checks
comm. This commit fixes the problem by setting the program name as comm
in newly restarted children.
VMware-BZ: #2191724 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Gurucharan Shetty <guru@ovn.org>
Martin Xu [Wed, 5 Sep 2018 16:25:37 +0000 (09:25 -0700)]
rhel: add 4.4 kernel in kmod build with mulitple versions, fedora
Extends 22c33c303932 (rhel: support kmod build against mulitple kernel
versions, fedora) to kernel version 4.4.x, x>=73 for SLES 12 SP3
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> CC: Greg Rose <gvrose8192@gmail.com> CC: Markos Chandras <mchandras@suse.de> CC: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 4 Sep 2018 20:59:06 +0000 (13:59 -0700)]
ovs-macros: Make tests log how long they waited when they succeed.
Many OVS tests wait up for 10 seconds for a condition to become true.
Usually these conditions are ones that should take only a second or so to
actually become true in practice, but on a busy and slow machine it's
possible that some tests might fail or come close to failing because 10
seconds is simply not enough there.
This commit adds logging for the case where a condition actually succeeds
to indicate the amount of time that was waited. This should make it easier
to identify whether we need to increase the maximum wait time from 10
seconds to something longer, by allowing us to see whether some of the
successful waits came close to timing out.
Reported-by: Thomas Goirand <zigo@debian.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047340.html Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Thomas Goirand <zigo@debian.org>
Martin Xu [Fri, 31 Aug 2018 18:52:42 +0000 (11:52 -0700)]
rhel: allow passing more flags to configure, fedora
Define a variable _ovs_config_extra_flags to allow passing more flags to
configure when building OVS kmod RPM. For example, to build with a
non-standard openssl and add an RPATH, use the following command
make rpm-fedora-kmod RPMBUILD_OPT='-D "_ovs_config_extra_flags
--with-openssl=<path to your openssl header> LDFLAGS=\"\${LDFLAGS} -Xlinker
-rpath=<path to your openssl lib>\""'
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> CC: Greg Rose <gvrose8192@gmail.com> CC: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Reviewed-by: Markos Chandras <mchandras@suse.de>
Patch 22c33c303932 used /usr/src/linux/<kernel version> as path of the
linux headers, which does not work for SLES. Use /lib/modules/<kernel
version>/build instead.
Fixes 22c33c303932 (rhel: support kmod build against mulitple kernel versions,
fedora)
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> CC: Greg Rose <gvrose8192@gmail.com> CC: Markos Chandras <mchandras@suse.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Reviewed-by: Markos Chandras <mchandras@suse.de>
This commit improves test coverage of the ossfuzz flow extract test harness
by extending the harness with additional API calls from lib/flow.c
An additional minor change is adding a config option to
flow_extract_target.options file in `tests/ossfuzz/config` to suppress
debug output while fuzzing.
A cursory evaluation shows that the patch covers 8 additional files and
improves line coverage of lib/flow.c from 23% to 37%.
Signed-off-by: Bhargava Shastry <bshastry at sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
Justin Pettit [Wed, 29 Aug 2018 00:38:25 +0000 (17:38 -0700)]
dpif-netdev: Prevent unsafe access when retrieving meter stats.
dpif_netdev_meter_get() retrieved a pointer to a meter entry without
holding a lock. It's possible that another thread could have deleted
that entry between retrieving the pointer and dereferencing the pointer.
This makes the function hold the lock the entire time the meter entry is
needed.
Markos Chandras [Tue, 28 Aug 2018 13:25:37 +0000 (14:25 +0100)]
utilities: Drop shebang from bash completion script
This fixes the following warning when building Open vSwitch on the
openSUSE Build Service:
W: non-executable-script /usr/share/bash-completion/completions/ovs-appctl-bashcomp.bash
This text file contains a shebang or is located in a path dedicated
for executables, but lacks the executable bits and cannot thus be
executed. If the file is meant to be an executable script, add the
executable bits, otherwise remove the shebang or move the file
elsewhere.
The file is meant to be sourced instead of executed, so we can simply
drop the shebang.
Signed-off-by: Markos Chandras <mchandras@suse.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Wed, 29 Aug 2018 02:54:01 +0000 (19:54 -0700)]
datapath: Fix builds on older kernels.
On older kernels, for example 3.19, the function rt6_get_cookie() is
not available and used with ipv6 config enabled; it was introduced in
4.2. Put back the replacement function if it does not exist.
Add a 3.19 version to travis.
CC: Yifeng Sun <pkusunyifeng@gmail.com> Fixes: bf61b8b1c1db ("datapath: Add support for kernel 4.16.x & 4.17.x.") Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Ben Pfaff [Fri, 24 Aug 2018 21:50:14 +0000 (14:50 -0700)]
ofp-actions: Re-fix error path for parsing OpenFlow actions.
A previous commit attempted to fix the error path when the actions nested
within clone provoked an error. However, this commit just introduced a new
problem in another case, since it made ofpacts_pull_openflow_actions__()
restore a previously valid pointer to data that might have been
reallocated.
This commit takes another approach. Instead of trying to restore anything
at all, it just defines ofpacts_pull_openflow_actions__() to clear the
output buffer when there's an error. It seems that this is less error
prone. Most of the callers don't care; this commit fixes up the ones that
do.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9975 Fixes: 20cdd1dbd546 ("ofp-actions: Avoid assertion failure for clone(ct(...bad actions...)).") Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Han Zhou [Sat, 25 Aug 2018 01:07:24 +0000 (18:07 -0700)]
debian: Move libovn out from package libopenvswitch.
Since we are packaging OVN and OVS components separately, libovn
shouldn't belong to OVS, so move it to ovn-common. Also, remove
it from libopenvswitch-dev.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: aginwala <aginwala@ebay.com>
dpif-netdev: Avoid reordering of packets in a batch with same megaflow
OVS reads packets in batches from a given port and packets in the
batch are subjected to potentially 3 levels of lookups to identify
the datapath megaflow entry (or flow) associated with the packet.
Each megaflow entry has a dedicated buffer in which packets that match
the flow classification criteria are collected. This buffer helps OVS
perform batch processing for all packets associated with a given flow.
Each packet in the received batch is first subjected to lookup in the
Exact Match Cache (EMC). Each EMC entry will point to a flow. If the
EMC lookup is successful, the packet is moved from the rx batch to the
per-flow buffer.
Packets that did not match any EMC entry are rearranged in the rx batch
at the beginning and are now subjected to a lookup in the megaflow cache.
Packets that match a megaflow cache entry are *appended* to the per-flow
buffer.
Packets that do not match any megaflow entry are subjected to slow-path
processing through the upcall mechanism. This cannot change the order of
packets as by definition upcall processing is only done for packets
without matching megaflow entry.
The EMC entry match fields encompass all potentially significant header
fields, typically more than specified in the associated flow's match
criteria. Hence, multiple EMC entries can point to the same flow. Given
that per-flow batching happens at each lookup stage, packets belonging
to the same megaflow can get re-ordered because some packets match EMC
entries while others do not.
The following example can illustrate the issue better. Consider
following batch of packets (labelled P1 to P8) associated with a single
TCP connection and associated with a single flow. Let us assume that
packets with just the ACK bit set in TCP flags have been received in a
prior batch also and a corresponding EMC entry exists.
The megaflow classification criteria does not include TCP flags while
the EMC match criteria does. Thus, all packets other than P4 match
the existing EMC entry and are moved to the per-flow packet batch.
Subsequently, packet P4 is moved to the same per-flow packet batch as
a result of the megaflow lookup. Though the packets have all been
correctly classified as being associated with the same flow, the
packet order has not been preserved because of the per-flow batching
performed during the EMC lookup stage. This packet re-ordering has
performance implications for TCP applications.
This patch preserves the packet ordering by performing the per-flow
batching after both the EMC and megaflow lookups are complete. As an
optimization, packets are flow-batched in emc processing till any
packet in the batch has an EMC miss.
A new flow map is maintained to keep the original order of packet
along with flow information. Post fastpath processing, packets from
flow map are *appended* to per-flow buffer.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Xu Binbin [Mon, 13 Aug 2018 02:27:42 +0000 (10:27 +0800)]
netdev-dpdk: Support the link speed of XL710
In the scenario of XL710, the link speed which stored in the table
of Interface is not 40G. Because the implementation of query of link
speed only support to 10G, the parameter 'current' will be a random
value in the scenario of higher link speed. In this case, incorrect
link speed of XL710 nic will be stored in the database.
Signed-off-by: Xu Binbin <xu.binbin1@zte.com.cn> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ben Pfaff [Fri, 24 Aug 2018 19:25:39 +0000 (12:25 -0700)]
ofproto-dpif-trace: Make -generate send packets to controller again.
Prior to the OVS 2.9 development cycle, any flow that sent a packet to a
controller required that the flow be slow-pathed. In some cases this led
to poor performance, so OVS 2.9 made controller actions fast-pathable. As
a side effect of the change, "ovs-appctl ofproto/trace -generate" no longer
sent packets to the controller. This usually didn't matter but it broke
the Faucet tutorial, which relied on this behavior. This commit
reintroduces the original behavior and thus should fix the tutorial.
Numan Siddique [Fri, 24 Aug 2018 19:26:52 +0000 (00:56 +0530)]
ovn: Fix the issue in IPv6 Neigh Solicitation responder for router IPs
Commit [1] added a new action 'nd_na_router' to set the router bit
in the 'flags' field of the Neighbour Adv packet for router IPs.
This action was used in the router pipeline. But the logical switch
pipeline also adds the Neighbour Adv flows for router IPs but with
'nd_na' action (which the commit [1] didn't handle).
This patch fixes this by changing the action to 'nd_na_router' for
router IPs.
Without this patch, the IPv6 functionality is broken.
[1] - "c9756229ed: ovn: Set proper Neighbour Adv flag when replying
for NS request for router IP"
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Han Zhou <hzhou8@ebay.com>
Yunjian Wang [Mon, 27 Aug 2018 11:52:55 +0000 (19:52 +0800)]
dpctl: Fix memory leak in dp_exists().
Fixes: ffdcd110fa62 ("dpctl: Make opt_dpif_open() more general.") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Greg Rose [Fri, 24 Aug 2018 20:36:29 +0000 (13:36 -0700)]
ofproto-dpif: Check for EBUSY as well
Guru reported that we can't create more than one geneve tunnel.
Sometimes a driver will return EBUSY as well as EEXIST for some
duplicate configurations. Check for EBUSY too.
Fixes: 7521e0cf9e ("ofproto-dpif: Let the dpif report when a ...")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047214.html Reported-by: Guru Shetty <guru@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch enables GRE tests to run even if the buggy `ip`
is being used.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
Yifeng Sun [Tue, 21 Aug 2018 14:42:08 +0000 (07:42 -0700)]
datapath: Add support for kernel 4.16.x & 4.17.x
Add support for kernel version up to 4.17.x. On Travis, build passed
for all kernel versions. And no new test fails are introduced by this
patch.
Cleaned up file datapath/linux/compat/include/net/ip6_fib.h which
has no effect to kernel module but brings complexity to porting.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
William Tu [Tue, 21 Aug 2018 23:03:14 +0000 (16:03 -0700)]
erspan: add big endian bit fields.
Big-endian systems arrange bit fields in the opposite order.
The patch follows the linux kernel's approach by defining the
big and little endian bit-field of ERSPAN header using #ifdef.
Tested on zelenka.debian.org
(https://db.debian.org/machines.cgi?host=zelenka).
Tested-by: Ben Pfaff <blp@ovn.org> Reported-by: James Page <james.page@canonical.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-August/351382.html Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 21 Aug 2018 16:22:03 +0000 (09:22 -0700)]
tests: Fix hash function dependencies in "tunnel - ERSPAN v1/v2 metadata".
This test only worked if each OpenFlow port was assigned a particular
datapath port number: p1 to port 3, p2 to port 2, p3 and p4 to port 1.
This happened consistently on little-endian architectures because of the
use of a particular hash function, but on big-endian architectures it
failed because the hash function was different.
This commit fixes the problem by adding the non-dummy ports separately.
(Dummy ports try to take the datapath port number corresponding to their
name, when it is available.) This does result in swapping a couple of
datapaths port numbers, so that p1 has port 1, p2 has port 2, and the
erspan ports have port 3, hence the size of the patch.
Reported-by: James Page <james.page@canonical.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-August/351382.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
Aaron Conole [Wed, 8 Aug 2018 14:36:10 +0000 (10:36 -0400)]
ovn-ctl: allow configuring user:group for daemons
Add two options, one for controlling the ovs daemon user/group, and the
other for controlling the ovn daemon user/group. This allows a fine-grained
split between OVN and OVS daemons, and keeps the syntax and user/group
separation from ovs-ctl when running ovn-ctl.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Martin Xu [Mon, 20 Aug 2018 21:24:05 +0000 (14:24 -0700)]
rhel: support kmod build against mulitple kernel versions, fedora
This patch ports changes from kmod rhel6 spec file to fedora spec file,
to support packaging kernel modules built against multiple versions of
kernel sources.
RHEL 7.4 introduced backward incompatible changes in the kernel. As
a result, prebuilt PRM packages against kernels newer than 693.17.1
will cannot be used on systems with older kernels, vice versa.
Intended to work only on RHEL 7.4 (kernel version 3.10.0-693.yy.zz).
This patch allows multiple kernel version numbers delimited by
whitespace to be passed as variable "kversion". The result RPM packages
the kernel module .ko files from all specified kernel versions. For
example,
make rpm-fedora-kmod \
RPMBUILD_OPT='-D "kversion 3.10.0-693.1.1.el7.x86_64 \
3.10.0-693.17.1.el7.x86_64"'
By default, make tries to build against the current running kernel.
This patch also includes a script to update the weak-update symlinks
if the system kernel version is upgraded or downgraded after
openvswitch-kmod is installed.
Signed-off-by: Martin Xu <martinxu9.ovs@gmail.com> CC: Greg Rose <gvrose8192@gmail.com> CC: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Leitner <fbl@redhat.com>
Han Zhou [Mon, 20 Aug 2018 05:27:31 +0000 (22:27 -0700)]
ovn-northd: Support learning neighbor from ARP request.
Current LR dynamic ARP learning support only ARP responses. If a
IP-MAC binding is learned, it will not get updated even if a host
send a GARP *request* to inform the new binding. This patch supports
learning neighbor changes from ARP requests, including GARP requests.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Mon, 20 Aug 2018 05:27:30 +0000 (22:27 -0700)]
ovn-northd: LR respond ARP from valid subnet only.
Currently ovn LR datapath responds ARP requests even if the ARP
requestor's src IP doesn't belong to the LR port's subnets. This
may generate unnecessary ARP responses and there could also be
security concerns. This patch restricts the ARP response only if
the requestor's IP matches the LR port's subnets.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 18 Aug 2018 17:17:37 +0000 (10:17 -0700)]
netdev-linux: Avoid division by 0 if kernel reports bad scheduler data.
If the kernel reported a value of 0 for the second value in
/proc/net/psched, it would cause a division-by-zero fault in
read_psched(). I don't know of a kernel that would actually do that, but
it's still better to be safe.
Found by clang static analyzer.
Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Ben Pfaff [Mon, 6 Aug 2018 21:36:34 +0000 (14:36 -0700)]
ovsdb-client: Make "wait" command logging more sensible.
The "wait" command in ovsdb-client (which was introduced as part of the
clustering support) fairly often logs things that are normal for it but
in other circumstances might be cause for concern, for example messages
about being unable to connect to a remote. Until now, it has tried to
suppress some of those itself by raising log levels. Unfortunately, in
some cases this had the opposite effect because it overrode any settings on
the command line, such as an attempt in ovsdb-cluster.at to suppress all
logging related to the timeval module. This commit drops the special
log levels from the "wait" command and puts equivalents into the tests
themselves.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Wed, 15 Aug 2018 21:57:13 +0000 (14:57 -0700)]
ofp-actions: Avoid assertion failure for clone(ct(...bad actions...)).
decode_NXAST_RAW_CT() temporarily pulls data off the beginning of its
ofpacts output ofpbuf and, on its error path, fails to push it back on.
At a higher layer, decode_NXAST_RAW_CLONE() asserts, via
ofpact_finish_CLONE(), that the ofpact_clone that it put is still in the
place where it put it, which causes an assertion failure.
The root cause here is the failure to re-push the clone header. One could
fix that, but it would be pretty easy for that to go wrong again on some
other obscure error path. Instead, this commit just makes the problem go
away by always saving and restoring 'ofpact->data' if a decode fails.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9862 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Justin Pettit [Fri, 17 Aug 2018 19:48:54 +0000 (12:48 -0700)]
dpif-netlink: Prevent abort in probe_broken_meters().
Commit 92d0d515d ("dpif-netlink: Probe for broken Linux meter
implementations.") introduced a deadlock on the 'once' structure
declared in probe_broken_meters() with the following callstack:
Yi-Hung Wei [Fri, 17 Aug 2018 09:05:09 +0000 (02:05 -0700)]
dpif-netlink: Implement conntrack zone limit
This patch provides the implementation of conntrack zone limit
in dpif-netlink. It basically utilizes the netlink API to
communicate with OVS kernel module to set, delete, and get conntrack
zone limit.
Yi-Hung Wei [Fri, 17 Aug 2018 09:05:07 +0000 (02:05 -0700)]
dpif: Support conntrack zone limit.
This patch defines the dpif interface to support conntrack
per zone limit. Basically, OVS users can use this interface
to set, delete, and get the conntrack per zone limit for various
dpif interfaces. The following patch will make use of the proposed
interface to implement the feature.
Currently, nf_conntrack_max is used to limit the maximum number of
conntrack entries in the conntrack table for every network namespace.
For the VMs and containers that reside in the same namespace,
they share the same conntrack table, and the total # of conntrack entries
for all the VMs and containers are limited by nf_conntrack_max. In this
case, if one of the VM/container abuses the usage the conntrack entries,
it blocks the others from committing valid conntrack entries into the
conntrack table. Even if we can possibly put the VM in different network
namespace, the current nf_conntrack_max configuration is kind of rigid
that we cannot limit different VM/container to have different # conntrack
entries.
To address the aforementioned issue, this patch proposes to have a
fine-grained mechanism that could further limit the # of conntrack entries
per-zone. For example, we can designate different zone to different VM,
and set conntrack limit to each zone. By providing this isolation, a
mis-behaved VM only consumes the conntrack entries in its own zone, and
it will not influence other well-behaved VMs. Moreover, the users can
set various conntrack limit to different zone based on their preference.
The proposed implementation utilizes Netfilter's nf_conncount backend
to count the number of connections in a particular zone. If the number of
connection is above a configured limitation, ovs will return ENOMEM to the
userspace. If userspace does not configure the zone limit, the limit
defaults to zero that is no limitation, which is backward compatible to
the behavior without this patch.
The following high leve APIs are provided to the userspace:
- OVS_CT_LIMIT_CMD_SET:
* set default connection limit for all zones
* set the connection limit for a particular zone
- OVS_CT_LIMIT_CMD_DEL:
* remove the connection limit for a particular zone
- OVS_CT_LIMIT_CMD_GET:
* get the default connection limit for all zones
* get the connection limit for a particular zone
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>