Ilya Maximets [Tue, 12 Dec 2017 08:32:40 +0000 (11:32 +0300)]
travis: Install libnuma dependency for DPDK.
libnuma is a default dependency for DPDK 17.11 because
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES and CONFIG_RTE_LIBRTE_VHOST_NUMA
are enabled by default for most architectures.
libnuma-dev package installation fixes the DPDK build:
eal_memory.c:56:18: fatal error:
numa.h: No such file or directory
CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Ian Stokes <ian.stokes@intel.com>
Ilya Maximets [Tue, 12 Dec 2017 08:32:39 +0000 (11:32 +0300)]
travis: Unify DPDK build directory for stable/not stable releases.
Currently stable dpdk releases has 'dpdk-stable-$DPDK_VER' directory
in the tarball, but not stable has just 'dpdk-$DPDK_VER'.
This produces issues while moving from stable release to not stable
and vice versa. For example recent update to DPDK v17.11 broke the
travis build:
'dpdk-17.11.tar.gz' saved
./.travis/linux-build.sh: line 61:
cd: dpdk-stable-17.11: No such file or directory
With this change 'dpdk-$DPDK_VER' format will be used for all the
types of dpdk releases by renaming the source directory.
CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Ian Stokes <ian.stokes@intel.com>
Ben Pfaff [Tue, 12 Sep 2017 19:57:46 +0000 (12:57 -0700)]
ovsdb-idl: Tolerate initialization races for singleton tables.
By verifying that singleton tables (that is, tables that should have exactly
one row) are empty when they emit transactions that insert into them,
ovs-vsctl and similar tools tolerate initialization races, where more than one
client at a time tries to initialize a singleton table.
The upshot is that if you create a database and then run multiple ovs-vsctl
(etc.) commands against it in parallel (without first initializing it
serially), then without this patch sometimes you will sometimes get failures
but this patch avoids them.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Ben Pfaff [Fri, 8 Dec 2017 21:24:28 +0000 (13:24 -0800)]
ovsdb-idl: Fix assertion failure on error path parsing server reply.
If the database server sent an error reply to a monitor_cond request, and
the error was not a JSON string, then passing the error to json_string()
caused an assertion failure.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Darrell Ball [Thu, 7 Dec 2017 02:04:20 +0000 (18:04 -0800)]
conntrack: Fix icmp error address sanity check.
An address sanity check is done on icmp error packets to
check that the icmp error payload makes sense w.r.t. the
packet itself.
The sanity check was partially incorrect since it tried
to verify the source address of the error packet against the
original destination, which does not makes since the error
can be generated by any intermediate node.
Darrell Ball [Mon, 4 Dec 2017 16:13:07 +0000 (08:13 -0800)]
conntrack: Disable algs by default.
Presently, alg processing is enabled by default to better exercise code.
This is similar to kernels before 4.7 as well. The recommended default
behavior in the newer kernels is to only process algs if a helper is
supplied in a conntrack rule. The behavior is changed to match the
later kernels.
A test is extended to check that the control connection is still
created in such a case.
Darrell Ball [Mon, 4 Dec 2017 16:13:06 +0000 (08:13 -0800)]
conntrack: Allow specified alg port numbers.
Algs can use variable control port numbers for servers.
The main use case is a kind of feeble security measure; the
thinking being by some is that it obscures the alg traffic.
It is really not very effective, but the kernel has this
capability. This patch mimics the capability.
Ben Pfaff [Mon, 11 Dec 2017 18:34:01 +0000 (10:34 -0800)]
dpif-netdev: Avoid "sparse" warning.
"sparse" warns when odp_port_t is used directly in an inequality
comparison. This avoids the warning.
CC: Kevin Traynor <ktraynor@redhat.com> Fixes: a130f1a89bd8 ("dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com>
Ben Pfaff [Thu, 7 Dec 2017 22:03:56 +0000 (14:03 -0800)]
ofproto: Keep inserting buckets into a group from changing group type.
The "insert buckets" and "delete buckets" operations on a group should not
change the group's type or properties, but the implementation did this by
mistake. This fixes the problem.
Aaron Conole [Mon, 11 Dec 2017 15:07:39 +0000 (10:07 -0500)]
daemon-unix: include missing help information
These options have existed for a while, but were not expressed in the
help information. Inform the user that these options exist, and give
some basic help.
Ben Pfaff [Thu, 7 Dec 2017 21:01:58 +0000 (13:01 -0800)]
ofproto-dpif-xlate: Change assertion to log message.
Until now, compose_output_action__() has asserted that a packet output to
a patch port is not to be truncated. This commit changes this to an error
that will be included in trace output, for two reasons. First, this sounds
like only a minor problem to me which doesn't warrant killing the process.
Second, it will be easier to track down the actual problem (if any) if we
can get a trace instead of a segfault.
Reported-by: Kevin Lin <kevin@kelda.io>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045832.html Signed-off-by: Ben Pfaff <blp@ovn.org>
xlate_output_action() must tell some of the functions it calls whether the
packet is being truncated. Until now, it has inferred that based on
whether its max_len argument is nonzero.
Unfortunately, max_len conflates two different purposes. Historically it
was used only to limit the number of bytes of packets sent to an OpenFlow
controller in packet_in messages. When packet truncation was introduced,
it was then also used to specify the truncation length. This meant that,
for example, when xlate_output_reg_action() called into
xlate_output_action() passing along for max_len an OpenFlow controller byte
limit (which ovs-ofctl by default sets to 65535), xlate_output_action()
interpreted that as a truncation request and told the functions it called
that the packet was being truncated, which in the worst case led to
assertion failures.
This commit disentangles these two meaning of max_len, separating them into
two separate parameters, and updates the callers.
Reported-by: Kevin Lin <kevin@kelda.io>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045841.html Tested-by: Kevin Lin <kevin@kelda.io> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Wed, 13 Sep 2017 20:28:07 +0000 (13:28 -0700)]
tests: Always ignore "Broken pipe" and "Connection reset" log messages.
Until now, the ovn-controller-vtep, ovn-nbctl, and ovn-sbctl tests have
ignored "Broken pipe" and "Connection reset" messages. The same rationale
that applies to them also applies to ovs-vsctl and other utilities. It
seems easier to just always ignore them.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Fri, 8 Dec 2017 22:21:18 +0000 (14:21 -0800)]
stream-unix: Give accepted sockets distinct names for log messages.
At least on Linux, when process A connects to process B over a Unix
domain socket, unless process A bound its socket to a name before
it made the connection, process B gets an empty peer name. Until
now, OVS has just reported the name of the connection as "unix".
This is not meaningful, of course. I do not know of a good general
solution to this problem, but this commit attempts a step in the
right direction by at least giving each connection of this kind a
number: "unix#1", "unix#2", and so on. That way, in log messages
one can at least see which messages are related to a particular
connection.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Thu, 31 Aug 2017 21:55:44 +0000 (14:55 -0700)]
test-ovsdb: Triggers should wake up other triggers immediately.
When a trigger executes, it can make changes to the database that fulfill
the conditions for some other trigger to execute. ovsdb-server implements
this properly, but the code in test-ovsdb for testing triggers outside
ovsdb-server did not. This fixes the problem.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Michal Weglicki [Tue, 14 Nov 2017 10:59:44 +0000 (10:59 +0000)]
netdev-dpdk: extend netdev_dpdk_get_status to include if_type and if_descr
This commit extends netdev_dpdk_get_status API to include additional
driver-related information: if_type and if_descr.
v2->v3: Code rebase.
v3->v4: Minor comments applied.
v5->v6: Adds DPDK port specific description in documentation.
Co-authored-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Padding and aligning of dp_netdev_pmd_thread structure members
is useless, broken in a several ways and only greatly degrades
maintainability and extensibility of the structure.
Issues:
1. It's not working because all the instances of struct
dp_netdev_pmd_thread allocated only by usual malloc. All the
memory is not aligned to cachelines -> structure almost never
starts at aligned memory address. This means that any further
paddings and alignments inside the structure are completely
useless. Fo example:
Breakpoint 1, pmd_thread_main
(gdb) p pmd
$49 = (struct dp_netdev_pmd_thread *) 0x1b1af20
(gdb) p &pmd->cacheline1
$51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60
(gdb) p &pmd->cacheline0
$52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20
(gdb) p &pmd->flow_cache
$53 = (struct emc_cache *) 0x1b1afe0
All of the above addresses shifted from cacheline start by 32B.
Can we fix it properly? NO.
OVS currently doesn't have appropriate API to allocate aligned
memory. The best candidate is 'xmalloc_cacheline()' but it
clearly states that "The memory returned will not be at the
start of a cache line, though, so don't assume such alignment".
And also, this function will never return aligned memory on
Windows or MacOS.
2. CACHE_LINE_SIZE is not constant. Different architectures have
different cache line sizes, but the code assumes that
CACHE_LINE_SIZE is always equal to 64 bytes. All the structure
members are grouped by 64 bytes and padded to CACHE_LINE_SIZE.
This leads to a huge holes in a structures if CACHE_LINE_SIZE
differs from 64. This is opposite to portability. If I want
good performance of cmap I need to have CACHE_LINE_SIZE equal
to the real cache line size, but I will have huge holes in the
structures. If you'll take a look to struct rte_mbuf from DPDK
you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and
RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure.
3. Sizes of system/libc defined types are not constant for all the
systems. For example, sizeof(pthread_mutex_t) == 48 on my
ARMv8 machine, but only 40 on x86. The difference could be
much bigger on Windows or MacOS systems. But the code assumes
that sizeof(struct ovs_mutex) is always 48 bytes. This may lead
to broken alignment/big holes in case of padding/wrong comments
about amount of free pad bytes.
4. Sizes of the many fileds in structure depends on defines like
DP_N_STATS, PMD_N_CYCLES, EM_FLOW_HASH_ENTRIES and so on.
Any change in these defines or any change in any structure
contained by thread should lead to the not so simple
refactoring of the whole dp_netdev_pmd_thread structure. This
greatly reduces maintainability and complicates development of
a new features.
5. There is no reason to align flow_cache member because it's
too big and we usually access random entries by single thread
only.
So, the padding/alignment only creates some visibility of performance
optimization but does nothing useful in reality. It only complicates
maintenance and adds huge holes for non-x86 architectures and non-Linux
systems. Performance improvement stated in a original commit message
should be random and not valuable. I see no performance difference.
Most of the above issues are also true for some other padded/aligned
structures like 'struct netdev_dpdk'. They will be treated separately.
CC: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> CC: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Mark Kavanagh [Fri, 8 Dec 2017 10:53:47 +0000 (10:53 +0000)]
netdev-dpdk: vHost IOMMU support
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.
This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.
This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Mark Kavanagh [Fri, 8 Dec 2017 10:53:46 +0000 (10:53 +0000)]
netdev-dpdk: DPDK v17.11 upgrade
This commit adds support for DPDK v17.11:
- minor updates to accomodate DPDK API changes
- update references to DPDK version in Documentation
- update DPDK version in travis' linux-build script
- document DPDK v17.11 virtio driver bug
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Guoshuai Li <ligs@dtdream.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Yifeng Sun [Wed, 15 Nov 2017 14:59:24 +0000 (06:59 -0800)]
dpif-netdev: Fix memory leak
Valgrind complains in test 1019 (dpctl - add-if set-if del-if):
4,850,896 (4,850,240 direct, 656 indirect) bytes in 1 blocks are
definitely lost in loss record 364 of 364
by 0x517062: xcalloc (util.c:103)
by 0x46CBBC: dp_netdev_set_nonpmd (dpif-netdev.c:4498)
by 0x46CBBC: create_dp_netdev (dpif-netdev.c:1299)
by 0x46CBBC: dpif_netdev_open (dpif-netdev.c:1337)
by 0x472CB0: do_open (dpif.c:350)
by 0x472E6F: dpif_create (dpif.c:404)
by 0x472E6F: dpif_create_and_open (dpif.c:417)
by 0x430EBC: open_dpif_backer (ofproto-dpif.c:727)
by 0x430EBC: construct (ofproto-dpif.c:1411)
by 0x41B714: ofproto_create (ofproto.c:539)
by 0x40C84E: bridge_reconfigure (bridge.c:647)
by 0x4104C5: bridge_run (bridge.c:2998)
by 0x406FA4: main (ovs-vswitchd.c:119)
The reference count wasn't released at this earlier return.
This fix passes the test 'make check'.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Kevin Traynor [Thu, 23 Nov 2017 19:41:57 +0000 (19:41 +0000)]
dpif-netdev: Calculate rxq cycles prior to compare_rxq_cycles calls.
compare_rxq_cycles sums the latest cycles from each queue for
comparison with each other. While each comparison correctly
gets the latest cycles, the cycles could change between calls
to compare_rxq_cycle. In order to use consistent values through
each call of compare_rxq_cycles, sum the cycles before qsort is
called.
Requested-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Kevin Traynor [Thu, 23 Nov 2017 19:41:56 +0000 (19:41 +0000)]
dpif-netdev: Rename rxq_cycle_sort to compare_rxq_cycles.
This function is used for comparison between queues
as part of the sort. It does not do the sort itself.
As such, give it a more appropriate name.
Suggested-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Billy O'Mahony Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Kevin Traynor [Thu, 23 Nov 2017 19:41:55 +0000 (19:41 +0000)]
dpif-netdev: Add port/queue tiebreaker to rxq_cycle_sort.
rxq_cycle_sort is used to compare rx queues by their measured number
of cycles. In the event that they are equal, 0 could be returned.
However, it is observed that returning 0 results in a different sort
order on Windows/Linux. This is ok in practice but it causes a unit
test failure for
"1007: PMD - pmd-cpu-mask/distribution of rx queues" when running
on different OS's.
In order to have a consistent sort result across multiple OS's,
introduce a tiebreaker of port/queue.
Fixes: 655856ef39b9 ("dpif-netdev: Change rxq_scheduling to use rxq processing cycles.") Reported-by: Alin Gabriel Serdean <aserdean@ovn.org> Tested-by: Alin Gabriel Serdean <aserdean@ovn.org> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Kevin Traynor [Mon, 27 Nov 2017 17:25:49 +0000 (17:25 +0000)]
netdev-dpdk: Remove uneeded call to rte_eth_dev_count().
The call to rte_eth_dev_count() was added as workaround
for rte_eth_dev_get_port_by_name() not handling cases
when there was no DPDK ports.
In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name()
does handle this case (DPDK commit f9ae888b1e19).
rte_eth_dev_count() is no longer needed so remove it.
Acked-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Justin Pettit [Tue, 5 Dec 2017 07:22:40 +0000 (23:22 -0800)]
datapath-windows: Correct endianness for deleting zone.
The zone Netlink attribute is supposed to be in network-byte order, but
the Windows code for deleting conntrack entries was treating it as
host-byte order.
Yi-Hung Wei [Thu, 7 Dec 2017 18:40:04 +0000 (10:40 -0800)]
dpctl: Support flush conntrack by conntrack 5-tuple
With this patch, "flush-conntrack" in ovs-dpctl and ovs-appctl accept
a conntrack 5-tuple to delete the conntrack entry specified by the 5-tuple.
For example, user can use the following command to flush a conntrack entry
in zone 5.
Yi-Hung Wei [Thu, 7 Dec 2017 18:40:03 +0000 (10:40 -0800)]
ct-dpif,dpif-netlink: Support conntrack flush by ct 5-tuple
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple, and provides the implementation in dpif-netlink.
The implementation of dpif-netlink in the linux datapath utilizes the
NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in
nf_conntrack. Future patches will add support for the userspace and
Windows datapaths.
Ben Pfaff [Wed, 6 Dec 2017 00:27:13 +0000 (16:27 -0800)]
tests: Use $(MKDIR_P) to avoid races.
"test -d x || mkdir x" has a race when invoked in parallel: it is possible
for two processes to both see that 'x' does not exist and both try to
create it, and if that happens then one of them will fail. This avoids
the problem.
Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Numan Siddique [Mon, 4 Dec 2017 14:27:08 +0000 (19:57 +0530)]
OVN pacemaker: Add the monitor action for Master role
Pacemaker Resource agent periodically calls the OVN OCF's "monitor" action
periodically to check the status. But the OVN OCF script doesn't add the
action "monitor" for the role "Master" because of which the pacemaker
resource agent do not call the "monitor" action at all for the master.
In case OVN db servers exit for some reason this totally gets undetected
and one of the standby node is not promoted to master.
This patch adds the monitor action for "Master" role. Also the monitor
action do not check for the status of the ovn-northd (if manage_northd is yes).
This patch also checks for the status of the ovn-northd in the monitor action
for the "Master" role. If any of the ovsdb-server or ovn-northd is not running,
monitor action will return OCF_NOT_RUNNING and this will cause the pacemaker
to restart the OVN OCF resource.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1512568 Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Russell Bryant <russell@ovn.org> Signed-off-by: Russell Bryant <russell@ovn.org>
Yunjian Wang [Sat, 18 Nov 2017 10:01:27 +0000 (18:01 +0800)]
datapath: Fix kernel panic for uninitialized tun_dst of ovs_gso_cb.
The variable tun_dst in struct ovs_gso_cb isn't necessarily all-zeros which
came from the Netlink layer. When delete a netdev port and immediately add
a vxlan port, they maybe use the same port_no. So the variable tun_dst of
struct ovs_gso_cb hasn't be set, when the skb sent to the vxlan port. And
the panic will be triggered.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
OVN: Add external_ids to NAT and Logical_Router_Static_Route tables.
The external_ids column is missing from the NAT and
Logical_Router_Static_Route tables.
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Daniel Alvarez <dalvarez@redhat.com> Acked-by: Miguel Angel Ajo <majopela@redhat.com>
Ben Pfaff [Mon, 4 Dec 2017 16:33:49 +0000 (08:33 -0800)]
coding-style: Explain when to break lines before or after binary operators.
The coding style has never been explicit about this. This commit adds some
explanation of why one position or the other might be favored in a given
situation.
Numan Siddique [Fri, 1 Dec 2017 09:07:38 +0000 (14:37 +0530)]
ovn-ctl: Add new commands 'run_nb_server' and 'run_sb_server'
Presently if the user wants to start OVN db servers as separate containers,
'ovn-ctl' script is not useful as '--detach' option is passed when
ovsdb-servers are started. If the container command is 'ovn-ctl
start_nb_ovsdb', the container exits as soon as ovn-ctl exits.
This patch adds two new commands - 'run_nb_server' and 'run_sb_server'. This
will be really useful for the above mentioned requirement.
Without these commands, the user may have to first generate the db by running
'ovsdb-tool' and then start the container with the command 'ovsdb-server
ovnnb_db.db ....' and this is very inconvenient.
This patch also updates the documentation in ovn-ctl.8.xml.
Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 30 Nov 2017 16:31:24 +0000 (08:31 -0800)]
netdev: netdev_get_etheraddr is not functioning as advertised.
netdev_get_etheraddr claims to clear 'mac' on error, but it fails to do so.
When looking further into both netdev_windows_get_etheraddr() and
netdev_linux_get_etheraddr(), 'mac' is also not cleared. This will lead to
usage of uninitialised ofputil_phy_port.hw_addr.
v1 -> v2: fixed a bug in v1 found by Ben, thanks Ben.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
odp-execute: Skip processing actions when batch is emptied
Today in OVS, when errors are encountered during the execution
of an action the entire batch of packets may be deleted (for e.g.
in processing push_tnl_action, if the port is not found in the
port_cache of PMD). The remaining actions continue to be executed
even though there are no packets to be processed.
It is assumed that the code dealing with each action checks that
the batch is not empty before executing. Crashes may occur if the
assumption is not met.
The patch makes OVS skip processing of further actions from the
action-set once a batch is emptied. Doing so centralizes the check
in one place and avoids the possibility of crashes.
This change DOES NOT fix any existing bug in the code, only a
precautionary measure to avoid crashes if new actions does not
take care of empty batches.
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 30 Nov 2017 17:45:11 +0000 (09:45 -0800)]
types: Avoid compound literals as initializers.
Older GCC can't cope.
Reported-by: Guoshuai Li <ligs@dtdream.com> Reported-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Reported-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Thu, 30 Nov 2017 12:55:03 +0000 (15:55 +0300)]
cmap: Use PADDED_MEMBERS macro for cmap_bucket padding.
Current implementation of manual padding inside struct cmap_bucket
doesn't work for some cacheline sizes. For example, if CACHE_LINE_SIZE
equals to 128, compiler adds an additional 8 bytes: 4 bytes between
'hashes' and 'nodes' and 4 bytes after the manual 'pad'. This leads to
build time assertion, because sizeof(struct cmap_bucket) == 136.
Fix that by using PADDED_MEMBERS macro, which will handle all the
unexpected compiler paddings.
This is safe because we still have build time assert for the structure
size. Other possible solution is to pack the structure, but the padding
marco looks better and matches the other code.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Mon, 20 Nov 2017 12:26:39 +0000 (04:26 -0800)]
ofproto-dpif-xlate: Fix bug that may leak ofproto_flow_mod
When ofm is not referenced by xc_entry, we should release its
resources by calling ofproto_flow_mod_uninit because no one is
going to use it in this function.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 15 Nov 2017 14:59:26 +0000 (06:59 -0800)]
bfd: Fix memory leak
Valgrind complains in test 2359 ():
864 (576 direct, 288 indirect) bytes in 18 blocks are definitely
lost in loss record 96 of 101
by 0x4A6D64: xmalloc (util.c:120)
by 0x40BC04: gateway_chassis_get_ordered (gchassis.c:73)
by 0x408CF0: bfd_calculate_chassis (bfd.c:219)
by 0x408CF0: bfd_run (bfd.c:257)
by 0x407F72: main (ovn-controller.c:718)
gateway_chassis wasn't released before the 'continue' line.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Yifeng Sun [Wed, 15 Nov 2017 14:59:25 +0000 (06:59 -0800)]
dpif: Fix memory leak
Valgrind complains in test 2322 (ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR):
31,584 (26,496 direct, 5,088 indirect) bytes in 48 blocks are definitely
lost in loss record 422 of 427
by 0x5165F4: xmalloc (util.c:120)
by 0x466194: dp_packet_new (dp-packet.c:138)
by 0x466194: dp_packet_new_with_headroom (dp-packet.c:148)
by 0x46621B: dp_packet_clone_data_with_headroom (dp-packet.c:210)
by 0x46621B: dp_packet_clone_with_headroom (dp-packet.c:170)
by 0x49DD46: dp_packet_batch_clone (dp-packet.h:789)
by 0x49DD46: odp_execute_clone (odp-execute.c:616)
by 0x49DD46: odp_execute_actions (odp-execute.c:795)
by 0x471663: dpif_execute_with_help (dpif.c:1296)
by 0x473795: dpif_operate (dpif.c:1411)
by 0x473E20: dpif_execute.part.21 (dpif.c:1320)
by 0x428D38: packet_execute (ofproto-dpif.c:4682)
by 0x41EB51: ofproto_packet_out_finish (ofproto.c:3540)
by 0x41EB51: handle_packet_out (ofproto.c:3581)
by 0x4233DA: handle_openflow__ (ofproto.c:8044)
by 0x4233DA: handle_openflow (ofproto.c:8219)
by 0x4514AA: ofconn_run (connmgr.c:1437)
by 0x4514AA: connmgr_run (connmgr.c:363)
by 0x41C8B5: ofproto_run (ofproto.c:1813)
by 0x40B103: bridge_run__ (bridge.c:2919)
by 0x4103B3: bridge_run (bridge.c:2977)
by 0x406F14: main (ovs-vswitchd.c:119)
the parameter dp_packet_batch is leaked when 'may_steal' is true.
When dpif_execute_helper_cb is passed with a true 'may_steal', it
is supposed to take the ownership of dp_packet_batch and release
it when done.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 28 Nov 2017 23:32:24 +0000 (15:32 -0800)]
types: New macros ETH_ADDR_C and ETH_ADDR64_C.
These macros expand to constants of type struct eth_addr and struct
eth_addr64, respectively, and make it more convenient to initialize or
assign to an Ethernet address object.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>
Ben Pfaff [Tue, 28 Nov 2017 18:28:33 +0000 (10:28 -0800)]
util: Make xmalloc_cacheline() allocate full cachelines.
Until now, xmalloc_cacheline() has provided its caller memory that does not
share a cache line, but when posix_memalign() is not available it did not
provide a full cache line; instead, it returned memory that was offset 8
bytes into a cache line. This makes it hard for clients to design
structures to be cache line-aligned. This commit changes
xmalloc_cacheline() to always return a full cache line instead of memory
offset into one.
Timothy Redaelli [Wed, 29 Nov 2017 16:46:53 +0000 (17:46 +0100)]
redhat: Create /etc/openvswitch/* with openvswitch as user/group
Without this commit is not possible to upgrade an openvswitch release
that includes the commit ac416a3ab2d2 (for example 2.8.0) with another release
that includes the commit ac416a3ab2d2 (for example master or 2.8.1), because
rpm changes the user/group of /etc/openvswitch to root/root, but ovsdb-server
starts with the user openvswitch and so it doesn't have permissions to write in
/etc/openvswitch/conf.db.
This patch tell rpm to use the openvswitch user and group for
/etc/openvswitch and /etc/openvswitch/default.conf.
Reported-by: Mark Michelson <mmichels@redhat.com> CC: aaron conole <aconole@redhat.com> Fixes: ac416a3ab2d2 ("redhat: dynamically allocate and reference ovs user") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Mark Michelson <mmichels@redhat.com>
Ilya Maximets [Wed, 29 Nov 2017 10:50:45 +0000 (13:50 +0300)]
smap: Return default on failure in smap_get_int/ullong.
Currently smap_get_int/ullong doesn't check any conversion errors.
Most implementations of atoi/strtoull return 0 in case of failure.
This leads to returning zero in case of wrongly set database values.
For example, commands
ovs-vsctl set interface iface options:key=\"\"
ovs-vsctl set interface iface options:key=qwe123
ovs-vsctl set interface iface options:key=abc
will have exactly same effect as
ovs-vsctl set interface iface options:key=0
in case where 'key' is an integer option of the iface.
Can be checked with 'other_config:emc-insert-inv-prob' or other
integer 'options' and 'other_config's.
0 could be not a default and not safe value for many options and
it'll be better to return default value instead if any.
Conversion functions from 'util' library used to provide proper
error handling.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Balazs Nemeth [Wed, 1 Nov 2017 15:20:47 +0000 (15:20 +0000)]
tunnel: Fix deletion of datapath tunnel ports in case of reconfiguration
There is an issue in OVS with tunnel deletion during the
reconfiguration of OF tunnels. If the dst_port value is changed, the
old tunnel map entry will not be deleted, because the tp_port
argument of tnl_port_map_delete() has the new dst_port setting, hence
the tunnel cannot be found in the list of tnl_port structures.
The patch corrects this mechanism by adding a new argument,
'old_odp_port' to tnl_port_reconfigure(). This value is used to
identify the datapath tunnel port which is being reconfigured. In
connection with this fix, to unify the tunnel port map handling,
odp_port value is used to search the proper port to insert and delete
tunnel map entries as well. This variable can be used instead of
tp_port, as it is unique for all datapath tunnel ports, and there is
no need to reach dst_port from netdev_tunnel_config structure.
This patch also adds a printout to check the reference counter of
a tnl_port structure in tnl-port.c. Extending OVS unit test cases to
have ref_cnt values in the expected dump. Adding new test cases to
check if packet receiving is still working in the case of OF tunnel
port deletion. Adding new test cases to check the reference counter
in case of OF tunnel deletion or reconfiguration.
Signed-off-by: Balazs Nemeth <balazs.nemeth@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
process: Extend get_process_info() for additional fields.
This commit enables the fields relating to process name and the core
number the process was last scheduled. The fields will be used by keepalive
monitoring framework in future commits.
This commit also fixes the following "sparse" warning:
lib/process.c:439:16: error: use of assignment suppression and length
modifier together in gnu_scanf format [-Werror=format=].
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jakub Sitnicki [Fri, 29 Sep 2017 15:05:23 +0000 (17:05 +0200)]
ovn-northd; Treat logical ports of router type as always being up
Employ the simplest possible approach to determine the state of logical
ports that connect to logical routers by hardcoding it to always up.
This is intended to be less surprising than the current approach where
router ports appear as being down (with the exception of ones linking to
gateway routers, which are bound).
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045202.html Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Miguel Angel Ajo <majopela@redhat.com>
Jakub Sitnicki [Fri, 29 Sep 2017 15:05:22 +0000 (17:05 +0200)]
ovn-northd: Refactor logic for logical port 'up' state update
No functional change. Make it obvious that we determine the logical
port 'up' state by checking for bound chassis, and update the NB DB only
when state has not been set yet or current state is different.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Miguel Angel Ajo <majopela@redhat.com>
Shashank Ram [Mon, 20 Nov 2017 23:06:14 +0000 (15:06 -0800)]
datapath-windows: Account for VLAN tag in tunnel Decap
Decap functions for tunneling protocols do not compute
the packet header offsets correctly when there is a VLAN
tag in the L2 header. This results in incorrect checksum
computation causing the packet to be dropped.
This patch adds support to account for the VLAN tag in the
packet if its present, and makes use of the OvsExtractLayers()
function to correctly compute the header offsets for different
layers.
Testing done:
- Tested Geneve, STT, Vxlan and Gre and verified that there
are no regressions.
- Verified that packets with VLAN tags are correctly handled
in the decap code of all tunneling protocols. Previously,
this would result in packet drops due to invalid checksums
being computed.
- Verified that non-VLAN tagged packets are handled correctly.
Ben Pfaff [Mon, 27 Nov 2017 01:34:59 +0000 (17:34 -0800)]
odp-util: Fix buffer overread in parsing string form of ODP flows.
scan_u128() should return 0 on an error but it actually returned an errno
value in some cases, so a command like this:
ovs-appctl dpctl/add-flow 'ct_label(1/55555555555555555555555555)' ''
could cause a buffer overread.
This bug is not as severe as it may sound because the string form of ODP
flows is not used over OpenFlow or OVSDB, only through the appctl interface
that is normally used just by local system administrators and not exposed
over a network.
Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Ben Pfaff [Mon, 27 Nov 2017 00:26:41 +0000 (16:26 -0800)]
tc: Fix build breakage on GCC 7 by annotating fall-through.
Open vSwitch enables the GCC 7+ option that warns about fall-through
switch statements. This commit fixes newly introduced warnings.
Fixes: d6118e628988 ("netdev-tc-offloads: Verify csum flags on dump from tc") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Paul Blakey <paulb@mellanox.com>
Numan Siddique [Wed, 8 Nov 2017 08:59:07 +0000 (14:29 +0530)]
OpenvSwitch logrotate: Use ctl file path as target in ovs-appctl to reset logs
Presently, logrotate script, searches for the pid files in /var/log/openvswitch
and passes the pid file name (without .pid) as target to ovs-appctl. This approach
doesn't work for OVN DB servers since the ctl files are generated as "ovnnb_db.ctl"
and "ovnsb_db.ctl". So search for the .ctl files instead and use them as target to
ovs-appctl.
Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>
Numan Siddique [Wed, 8 Nov 2017 08:58:49 +0000 (14:28 +0530)]
ovn-ctl: Add -vfile:info option to OVN_NB/SB_LOG options
In the RHEL environment, when OVN db servers are started using ovn-ctl,
log files are empty. Adding "-vfile:info" option to ovsdb-server is
resolving this issue. Running 'ovs-apptctl -t .. vlog/reopen" results in the
logs appearing in the log files. This issue is seen with 2.7.2.
"-vfile:info" option is passed to ovn-northd and ovn-controller when starting.
There is no harm in adding this to OVN db servers.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>