Ben Pfaff [Mon, 15 Aug 2016 18:34:02 +0000 (11:34 -0700)]
ovn: Rewrite logical action parsing and encoding library.
Until now, parsing logical actions and encoding them into OpenFlow has
happened in a single step. An upcoming commit will want to examine
actions after parsing without encoding them into OpenFlow. This commit
refactors OVN logical actions to make this possible.
The new form of the OVN action handling is closely modeled on ofp-actions
in the OVS core library. Notable differences are that OVN actions are
always fixed-length and that individual OVN actions can have destructors
(and thus can contain pointers to data that need to be freed when the
actions are destroyed).
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Justin Pettit <jpettit@ovn.org>
Amitabha Biswas [Mon, 15 Aug 2016 18:03:30 +0000 (11:03 -0700)]
ovsdb-idl: Fix bugs in Python IDL partial set and map.
This patch fixes a couple of bugs in commit a59912a0
(python: add support for partial map and partial set updates)
and reverses a simplication added in commit 884d9bad
(Simplify partial map Py3 IDL test) to make the Python3 test
cases passes.
The following changes have been made:
1. Allow multiple map updates on the same column in a transaction.
2. Partial map Py3 IDL test can now support multiple elements.
3. SetAttr overrides pre-existing insert and remove updates.
4. addvalue/delvalue contains unique elements
Signed-off-by: Amitabha Biswas <abiswas@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.
This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.
The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.
Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.
Thanks to Ben for some pointers from the crash dumps!
Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Jesse Gross [Sun, 14 Aug 2016 22:29:37 +0000 (15:29 -0700)]
ofproto-dpif-xlate: Use passed ctx in XLATE_REPORT_ERROR.
XLATE_REPORT_ERROR is a macro that takes struct xlate_ctx as an
argument but also implicitly uses 'ctx' from the local function
scope. This works with current uses but it really should be
using the argument.
Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Fri, 29 Jul 2016 21:39:29 +0000 (14:39 -0700)]
ovsdb: Make OVSDB backup sever read only
When ovsdb-sever is running in the backup state, it would be nice to
make sure there is no un-intended changes to the backup database.
This patch makes the ovsdb server only accepts 'read' transactions as
a backup server. When the server role is changed into an active server,
all existing client connections will be reset. After reconnect, all
clinet transactions will then be accepted.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Thu, 28 Jul 2016 22:57:40 +0000 (15:57 -0700)]
ovsdb: Fix bug, set rpc to NULL after freeing.
Found by inspection.
Tested-by: Daniel Levy <dlevy@us.ibm.com>
Reported-at: http://openvswitch.org/pipermail/discuss/2016-August/022322.html Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Andy Zhou [Thu, 28 Jul 2016 18:35:01 +0000 (11:35 -0700)]
ovsdb: Rename replication related variable names.
Current replication code refers the other ovsdb-sever instance as
a 'remote'. which is overloaded in ovsdb.
Switching to use active/backup instead to make it less confusing.
Active is the server that should be servicing the client, backup
server is the server that boots with the --sync-from option.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ryan Moats [Mon, 15 Aug 2016 00:48:24 +0000 (19:48 -0500)]
Simplify partial map Py3 IDL test added by commit a59912a0
Commit a59912a0 ("python: Add support for partial map
and partial set updates") added unit tests for the partial
map function for the python IDL. However, because Python3
doesn't order dictionaries consistently, this
test is a crap shoot for systems that support Python3.
As a short term fix, do not use a dictionary with multiple
elements for the partial map test case.
Change-Id: Ibdec10ebd895051321b9bff7d9fe8a7e0bd9eb88 Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Thu, 11 Aug 2016 12:21:39 +0000 (17:51 +0530)]
ovn-controller: Reset flow processing after (re)connection to switch
When ovn-controller reconnects to the ovs-vswitchd, it deletes all the
OF flows in the switch. It doesn't install the flows again, leaving
the datapath broken unless ovn-controller is restarted or ovn-northd
updates the SB DB.
The reason for this is
- lflow_reset_processing() is not called after the reconnection
- the hmap "installed_flows" is not cleared, because of which
ofctrl_put skips adding the flows to the switch.
This patch fixes the issue and also adds a test case to test
this scenario.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Sun, 14 Aug 2016 19:39:56 +0000 (12:39 -0700)]
ovs-ctl: Properly handle shell quoting in os-release.
Until now, this code did not strip "" or '' from variable assignments in
os-release. This fixes the problem.
Requested-by: Matt Mulsow <mamulsow@us.ibm.com>
Requested-at: https://github.com/openvswitch/ovs/pull/148 Fixes: c60d6b096436 ("ovs-ctl: support populating system info from /etc/os-release") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ryan Moats [Sat, 6 Aug 2016 22:46:30 +0000 (17:46 -0500)]
python: Add support for partial map and partial set updates
Allow the python IDL to use mutate operations more freely
by mimicing the partial map and partial set operations now
available in the C IDL.
Unit tests for both of these types of operations are included.
They are not carbon copies of the C tests, because testing
idempotency is a bit difficult for the current python IDL
test harness.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Commit f1ab6e06 ("Add/user partial set updates.) incorrectly
did not include HPE attribution for derived files
lib/ovsdb-set-op.[ch]. Add the attribution to correct this.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
but for columns that store sets of values rather than key-value
pairs. These columns will now be able to use the OVSDB mutate
operation to transmit deltas on the wire rather than use
verify/update and transmit wait/update operations on the wire.
Side effect of modifying the comments in the partial map update
tests.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Fri, 5 Aug 2016 04:06:39 +0000 (09:36 +0530)]
ovn-northd: Add logical flows to support DHCPv6
OVN implements native DHCPv6. DHCPv6 options are stored
in the 'DHCP_Options' NB table and logical ports refer to this
table to configure the DHCPv6 options.
For each logical port configured with DHCPv6 Options following flows
are added
- A logical flow which copies the DHCPv6 options to the DHCPv6
request packets using the 'put_dhcpv6_opts' action and advances the
packet to the next stage.
- A logical flow which implements the DHCPv6 reponder by sending
the DHCPv6 reply back to the inport once the 'put_dhcpv6_opts' action
is applied.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Fri, 5 Aug 2016 04:06:07 +0000 (09:36 +0530)]
ovn-controller: Add 'put_dhcpv6_opts' action in ovn-controller
This patch adds a new OVN action 'put_dhcpv6_opts' to support native
DHCPv6 in OVN.
ovn-controller parses this action and adds a NXT_PACKET_IN2
OF flow with 'pause' flag set and the DHCPv6 options stored in
'userdata' field.
When the valid DHCPv6 packet is received by ovn-controller, it frames a
new DHCPv6 reply packet with the DHCPv6 options present in the
'userdata' field and resumes the packet and stores 1 in the 1-bit subfield.
If the packet is invalid, it resumes the packet without any modifying and
stores 0 in the 1-bit subfield.
A new 'DHCPv6_Options' table is added in SB DB which stores
the supported DHCPv6 options with DHCPv6 code and type. ovn-northd is
expected to popule this table.
Upcoming patch will add logical flows using this action.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn-controller: Use UDP checksums when creating Geneve tunnels.
Currently metadata transmitted by OVN over Geneve tunnels is
unprotected by any checksum other than the one provided by the link
layer - this includes both the VNI and data stored in options. Turning
on UDP checksums which cover this data has obvious benefits in terms of
integrity protection.
In terms of performance, this actually significantly increases throughput
in most common cases when running on Linux based hosts without NICs
supporting Geneve offload (around 60% for bulk traffic). The reason is
that generally all NICs are capable of offloading transmitted and received
UDP checksums (viewed as ordinary UDP packets and not as tunnels). The
benefit comes on the receive side where the validated outer UDP checksum
can be used to additionally validate an inner checksum (such as TCP), which
in turn allows aggregation of packets to be more efficiently handled by
the rest of the stack.
Not all devices see such a benefit. The most notable exception is hardware
VTEPs (currently using VXLAN but potentially Geneve in the future). These
devices are designed to not buffer entire packets in their switching engines
and are therefore unable to efficiently compute or validate UDP checksums.
In addition certain versions of the Linux kernel are not able to fully
take advantage of Geneve capable NIC offloads in the presence of checksums.
(This is actually a pretty narrow corner case though - earlier versions of
Linux don't support Geneve offloads at all and later versions support both
offloads and checksums well.)
In order avoid possible problems with these cases, efficient checksum
receive performance is exposed as an encap option in the southbound
database as a hint to remote senders. This currently defaults to off
for hardware VTEPs and on for all other cases.
Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Ben Pfaff <blp@ovn.org>
ovn-controller: Make encap processing more robust against changes.
Originally, processing of encapsulations simply iterated over all tables on
every wakeup and would replace anything that changed. This is somewhat
inefficient but it captured all changes.
Incremental processing avoided the need to do so much work but it could
miss several types of changes. In particular, it only monitored the chassis
table in the southbound database, so other changes (particularly in the
encap table) were not reflected. In addition, while it corrected some
changes to its data in OVS, others could go unnoticed.
This attempts to fix those issues by reflecting the most recent updates
to the southbound database in OVS at all times. It also increases safety
by avoiding the possibility of dangling pointers to old database rows and
eliminates the need to traverse the OVS database at all during most wakeups.
Fixes: 1d45d5a9 ("ovn-controller: Change encaps_run to work incrementally.") Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Ben Pfaff <blp@ovn.org>
ovn-controller: Fix memory leak when updating tunnels.
When a tunnel possibly needs to be updated, we are currently allocating
a new name for it. This is not necessary and in fact nothing uses the
name, which then results in the memory being leaked.
Fixes: 1d45d5a9 ("ovn-controller: Change encaps_run to work incrementally.") Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Ben Pfaff <blp@ovn.org>
Before calling the function "ofctrl_run" and "pinctrl_run", the "br-int"
has been checked. Remove the conditional statements in the function may
make the code clearer.
Signed-off-by: nickcooper-zhangtonghao <nickcooper-zhangtonghao@opencloud.tech> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ryan Moats [Wed, 3 Aug 2016 19:07:38 +0000 (19:07 +0000)]
ovsdb: Use better error message for "timeout" without waiting.
When setting a where clause, if the timeout is set to a value of 0,
the clause is tested once and if it fails, a message of '"wait" timed
out' is returned. This can be misleading because there wasn't any
real time, so change the message to '"where" clause test failed'.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Reported-by: Ryan Moats <rmoats@us.ibm.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-August/077083.html Fixes: f85f8ebb ("Initial implementation of OVSDB.") Signed-off-by: Ben Pfaff <blp@ovn.org>
ovn-controller: Add datapath-type and iface-types in chassis:external_ids
This patch reads the 'Bridge.datapath_type' column value of the integration
bridge and 'Open_vSwitch.iface_types' column value and sets these in the
external_ids:datapath-type and external_ids:iface-types of Chassis table.
This will provide hints to the CMS or clients monitoring OVN SB DB to
determine the datapath type (DPDK or non-DPDK) configured and take some
actions based on it.
One usecase is, OVN neutron plugin can use this information to set the
vif_type (ovs or vhostuser) during the port binding.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Mark Kavanagh [Tue, 9 Aug 2016 16:01:20 +0000 (17:01 +0100)]
netdev-dpdk: add support for jumbo frames
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.
Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.
The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
[diproiettod@vmware.com rebased] Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
netdev: Make netdev_set_mtu() netdev parameter non-const.
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass. Might as well drop the const
attribute from the parameter, since this is a "set" function.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
vswitchd: Introduce 'mtu_request' column in Interface.
The 'mtu_request' column can be used to set the MTU of a specific
interface.
This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.
The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken. It will be reintroduced by a subsequent commit on
this series.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
dpif-netdev: Fix -Wformat warning on 32-bit build.
Use the appropriate format specifier for size_t, otherwise the 32-bit
build fails.
Reported-at: https://travis-ci.org/openvswitch/ovs/jobs/151938383 Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted
subtables") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
Ciara Loftus [Wed, 10 Aug 2016 14:28:27 +0000 (15:28 +0100)]
netdev-dpdk: add DPDK pdump capability
This commit provides the ability to 'listen' on DPDK ports and save
packets to a pcap file with a DPDK app that uses the librte_pdump
library. One such app is the 'pdump' app that can be found in the DPDK
'app' directory. Instructions on how to use this can be found in
INSTALL.DPDK-ADVANCED.md
Pdump capability in OVS with DPDK will only be initialised if the
CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are
set in DPDK. libpcap is required if the above configuration is used.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
ovs-bugtool: Correct "rmdir" error messages during "make distcheck".
Remove duplicated delete attempts and error messages during distcheck
clean procedure.
The problem is that during clean up procedure of distcheck:
rmdir: failed to remove ‘/openvswitch-2.5.90/_inst/share/openvswitch/bugtool-plugins/’: Directory not empty
rmdir: failed to remove ‘/openvswitch-2.5.90/_inst/share/openvswitch/bugtool-plugins/ovn/network-status ’: No such file or directory
The first entry is caused by xml file which is kept flat in the directory
structure (not in the subdirectory as it is for other plugins), and rmdir
"tries" to remove folder which keeps all plugins files and folders. That is
why additional check if directory is not empty is added, to prevent that.
The second entry is cause by some other commit when ovs plugin has been added:
stem=`echo "$$plugin" | sed 's,ovn/,,'`; \
So in that sense directory path has been modified during removal of xml
file, but it hasn't been updated during directory removal.
I didn't want to really change this logic, as I'm not sure if there
something else can be stored in this directory, but it was very tempting to
remove everything just by:
rm -rf "$(DESTDIR)$(bugtoolpluginsdir)/*"
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Thu, 11 Aug 2016 10:02:27 +0000 (12:02 +0200)]
dpif-netdev: dpcls per in_port with sorted subtables
The user-space datapath (dpif-netdev) consists of a first level "exact match
cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
and the OVS forwarding performance is determined by the megaflow classifier.
The megaflow classifier (dpcls) consists of a variable number of hash tables
(aka subtables), each containing megaflow entries with the same mask of
packet header and metadata fields to match upon. A dpcls lookup matches a
given packet against all subtables in sequence until it hits a match. As
megaflow cache entries are by construction non-overlapping, the first match
is the only match.
Today the order of the subtables in the dpcls is essentially random so that
on average a dpcls lookup has to visit N/2 subtables for a hit, when N is the
total number of subtables. Even though every single hash-table lookup is
fast, the performance of the current dpcls degrades when there are many
subtables.
How does the patch address this issue:
In reality there is often a strong correlation between the ingress port and a
small subset of subtables that have hits. The entire megaflow cache typically
decomposes nicely into partitions that are hit only by packets entering from
a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM ->
Phy).
Therefore, maintaining a separate dpcls instance per ingress port with its
subtable vector sorted by frequency of hits reduces the average number of
subtables lookups in the dpcls to a minimum, even if the total number of
subtables gets large. This is possible because megaflows always have an exact
match on in_port, so every megaflow belongs to unique dpcls instance.
For thread safety, the PMD thread needs to block out revalidators during the
periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD.
To monitor the effectiveness of the patch we have enhanced the ovs-appctl
dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups
per hit" to report the average number of subtable lookup needed for a
megaflow match. Ideally, this should be close to 1 and almost all cases much
smaller than N/2.
The PMD tests have been adjusted to the additional line in pmd-stats-show.
We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh.
With pure L3 tenant traffic between VMs on different nodes the resulting
netdev dpcls contains N=4 subtables. Each packet traversing the OVS
datapath is subject to dpcls lookup twice due to the tunnel termination.
Disabling the EMC, we have measured a baseline performance (in+out) of ~1.45
Mpps (64 bytes, 10K L4 packet flows). The average number of subtable lookups
per dpcls match is 2.5. With the patch the average number of subtable lookups
per dpcls match is reduced to 1 and the forwarding performance grows by ~50%
to 2.13 Mpps.
Even with EMC enabled, the patch improves the performance by 9% (for 1000 L4
flows) and 34% (for 50K+ L4 flows).
As the actual number of subtables will often be higher in reality, we can
assume that this is at the lower end of the speed-up one can expect from this
optimization. Just running a parallel ping between the VXLAN tunnel endpoints
increases the number of subtables and hence the average number of subtable
lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput
to 1.2 Mpps. With the patch the parallel ping has no impact on average number
of subtable lookups and performance. The performance gain is then ~75%.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
ovn-northd: Only warn about peer as switch port when it really is one.
At the end of join_logical_ports(), some ovn_ports might not have been
bound as logical switch ports or logical router ports, but the code assumed
that they were and gave a confusing warning when the assumption was
violated.
Russell Bryant [Fri, 12 Aug 2016 17:28:48 +0000 (13:28 -0400)]
release-process: Use markdown table format.
Update the release process document to use markdown formatting for the
table used to describe the 6 month release schedule. This will make it
be formatted correctly when converted to HTML on github and
openvswitch.org.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Pravin B Shelar [Fri, 12 Aug 2016 02:27:12 +0000 (19:27 -0700)]
datapath: compat: keep skb mark across tunnel devices.
Older kernel skb_scrub_packet() has bug which resets skb mark for
all packet. It is fixed during 3.18 release where it is reset
only for packets crossing namespace. So OVS is forced to use
compat skb_scrub_packet() on older kernel.
This is related to upstream bug fix commit ca7c7b9059e3
("skbuff: Do not scrub skb mark within the same name space").
VMware-BZ: #1710701 Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Joe Stringer [Fri, 12 Aug 2016 00:54:08 +0000 (17:54 -0700)]
ofproto-dpif-xlate: Fix VLOG_ERR_RL() call.
a716ef9a7a73 ("ofproto-dpif-xlate: Log flow in XLATE_REPORT_ERROR.")
inadvertantly broke build on clang due to improper passing of the ds
cstring into the VLOG() function:
error: format string is not a string literal
(potentially insecure) [-Werror,-Wformat-security]
XLATE_REPORT_ERROR(ctx, "over max translation depth %d", MAX_DEPTH);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: expanded from macro
'XLATE_REPORT_ERROR'
VLOG_ERR_RL(&error_report_rl, ds_cstr(&ds)); \
Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Thu, 11 Aug 2016 19:36:16 +0000 (12:36 -0700)]
ofproto-dpif-xlate: Log flow in XLATE_REPORT_ERROR.
To assist debugging pipelines when resubmit resource checks fail, print
the base_flow from the translation context. This base flow can then be
used from ofproto/trace to figure out which parts of the pipeline lead
to this translation error.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 11 Aug 2016 04:14:09 +0000 (21:14 -0700)]
ovs-bugtool: Switch from MD5 to SHA-256.
While going through a FIPS certification process we discovered that
ovs-bugtool uses MD5 to identify the contents of files. FIPS doesn't allow
use of the obsolete and broken MD5 algorithm, so this commit switches to
SHA-256.
In a way, this is a silly requirement. ovs-bugtool only uses MD5 to
identify file content, mostly to ensure that the contents of the bug report
have not been corrupted. MD5 is perfectly adequate for that purpose; in
fact a 16-bit CRC would probably be adequate. On the other hand, there is
basically no cost and no disadvantage to switching to SHA-256, so why not
do it? That's why I think that this is a reasonable change.
VMware-BZ: #1708786 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ilya Maximets [Wed, 10 Aug 2016 09:43:03 +0000 (12:43 +0300)]
netdev-dpdk: vhost: Fix double free and use after free with QoS.
While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will
free mbufs while executing 'netdev_dpdk_policer_run()'. After
that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()'
if 'may_steal == true'. This behaviour will break mempool.
Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't
do this ('may_steal == false'). This will lead to using of already freed
packets by the upper layers.
Fix that by copying all packets that we can't steal like it done
for DPDK_DEV_ETH devices and freeing only packets not freed by QoS.
Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Darrell Ball [Tue, 9 Aug 2016 02:20:38 +0000 (19:20 -0700)]
ovn: Fix receive from vxlan in ovn-controller.
The changes enable source node replication in OVN for receive from vxlan
tunnels. OVN only supports source node replication mode. This is needed
for ovn-controller to interoperate with hardware switches.
Previously hardware vtep interaction, which uses service node
replication by default for multicast/broadcast/unknown unicast traffic
partially "worked" by happenstance. Because of limited vxlan
encapsulation metadata, received packets were resubmitted to find
the egress port(s). This is not correct for multicast, broadcast and
unknown unicast traffic as traffic will get resent on the tunnel mesh.
ovn-controller is changed not to send traffic received from vxlan
tunnels out the tunnel mesh again. Traffic received from vxlan tunnels is
now only sent locally as intended with obvious benefits. This behavior is
newly documented in ovn-architecture.7.xml.
To support keeping state for receipt from a vxlan tunnel, a MFF logical
flags register flag is allocated.
As part of this change ovn-controller-vtep is hard-coded to set the
replication mode of each logical switch to source node as OVN will only
support source node replication.
I failed to see that lib/dpif-netdev.c actually needs the concurrency
provided by pvector prior to this change. More specifically, when a
subtable is removed, concurrent lookups may skip over another subtable
swapped in to the place of the removed subtable in the vector.
Since this was the only use of the non-concurrent pvector, it is
cleaner to revert the whole patch.
Reported-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Ryan Moats [Thu, 28 Jul 2016 22:17:41 +0000 (22:17 +0000)]
ovn-controller: Persist desired conntrack groups.
With incremental processing of logical flows desired conntrack groups
are not being persisted. This patch adds this capability, with the
side effect of adding a ds_clone method that this capability leverages.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Reported-by: Guru Shetty <guru@ovn.org>
Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076320.html Fixes: 70c7cfe ("ovn-controller: Add incremental processing to lflow_run and physical_run") Acked-by: Flavio Fernandes <flavio@flaviof.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit builds upon some of the recent ovs-ctl changes to build a
more integrated systemd setup. A new service (ovs-vswitchd) is
added to track the ovs-vswitchd, and ovsdb-server service is reserved
for the ovsdb-server daemon. The systemd scripts still use ovs-ctl to
actually initialize the daemons.
rhel/ovsdb-server.service: Rename the nonetwork service
Currently, openvswitch.service calls out to start
openvswitch-nonetwork.service. However, openvswitch-nonetwork.service
will be called ovsdb-server, so that it is a bit more reflective of
the dependencies. This commit does make the file a bit of a misnomer as
currently the ovsdb-server SERVICE will start the ovs-vswitchd service
as well. A future commit will clean this up, and change the ifup
configuration in the process.
Panu Matilainen [Wed, 10 Aug 2016 11:16:14 +0000 (14:16 +0300)]
ovs-ctl: support populating system info from /etc/os-release
On systemd-era hosts, OS name and version are available in sanitized
format from /etc/os-release(5) without resorting to calling (and thus
requiring) lsb_release. Support populating system-type and system-version
from /etc/os-release, prefer it over lsb_release, but permit overriding
via the OVS-specific system-type.conf and system-version.conf.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1350550 Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 8 Aug 2016 11:19:45 +0000 (14:19 +0300)]
netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.
Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.
Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.
Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
ports from attached virtio.") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.
Also this commit modifies egress_policer_qos_set() to correctly lock the QoS
spinlock while the egress policer configuration is updated to avoid
segfault.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
INSTALL.DPDK: Update documentation for DPDK 16.07 support
Replace 'dpdk_nic_bind.py' references with 'dpdk-devbind.py'. The script
name is changed in DPDK 16.07 as the script can be used also on crypto
devices along with NICs.
Update the command for setting packet forwarding mode in 'testpmd' app
from 'set fwd mac_retry' to 'set fwd mac retry'.
vxlan driver has bypass for local vxlan traffic, but that
depends on information about all VNIs on local system in
vxlan driver. This is not available in case of LWT.
Therefore following patch disable encap bypass for LWT
vxlan traffic.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Reported-by: Jakub Libosvar <jlibosva@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
net: vxlan: lwt: Use source ip address during route lookup.
LWT user can specify destination as well as source ip address
for given tunnel endpoint. But vxlan is ignoring given source
ip address. Following patch uses both ip address to route the
tunnel packet. This consistent with other LWT implementations,
like GENEVE and GRE.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback. destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.
This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.
This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.
ofproto: Consider datapath_type when looking for internal ports.
Interfaces with type "internal" end up having a netdev with type "tap"
in the dpif-netdev datapath, so a strcmp will fail to match internal
interfaces.
We can translate the types with ofproto_port_open_type() before calling
strcmp to fix this.
This fixes a minor issue where internal interfaces are considered
non-internal in the userspace datapath for the purpose of adjusting the
MTU.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Kyle Mestery [Mon, 8 Aug 2016 13:48:40 +0000 (06:48 -0700)]
ovs-vsctl: Change log level of vsctl_parent_process_info
While running the ovn-scale-test [1] port-binding tests [2], I notice a
continual stream of messages such as this:
2016-08-04 13:05:28.705 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_cikzNO to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04 13:05:28.712 547 INFO paramiko.transport [-] Connected (version 2.0, client OpenSSH_6.6.1p1)
2016-08-04 13:05:28.805 547 INFO paramiko.transport [-] Authentication (publickey) successful!
2016-08-04T13:05:28Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.042 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_tvovcK to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.285 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_HwG7AK to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.505 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_Lqbv92 to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.724 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_6f8uQW to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04T13:05:29Z|00002|vsctl|WARN|/proc/0/cmdline: open failed (No such file or directory)
2016-08-04 13:05:29.944 547 INFO rally_ovs.plugins.ovs.scenarios.ovn [-] bind lport_0996bf_nKl2XF to sandbox-172.16.200.24 on ovn-farm-node-uat-dal09-compute-325
Tracing these down, this is due to the check in vsctl_parent_process_info(),
which is verifying if the parent process can be opened. Since ovn-scale-test
runs sandboxes in containers, and these are run as root, there is no /proc/0
in the container. Thus, the check fails, and the error message is printed out.
It's unclear what value this log message provides, so removing it clears up
this problem and is probably the best option.
For the init process with pid of zero, this patch returns "init",
instead of trying to read from /proc/0/cmdline, which does not exist.
Ben Pfaff [Sat, 6 Aug 2016 06:47:59 +0000 (23:47 -0700)]
lflow: Correct register definitions to use subfields for overlaps.
OVN expressions need to know what fields overlap or alias one another.
This is supposed to be done via subfields: if two fields overlap, then the
smaller one should be defined as a subfield of the larger one. For
example, reg0 should be defined as xxreg0[96..127]. The symbol table in
lflow didn't do this, so it's possible for confusion to result. (I don't
have evidence of this actually happening, because it would only occur
in a case where the same bits of a field were referred to with different
names.)
This commit fixes the problem. It deserves a test, but that's somewhat
difficult at this point, so it will actually happen in a future commit.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Fri, 15 Jul 2016 22:31:42 +0000 (15:31 -0700)]
expr: Give a subfield a direct pointer to its parent in struct expr_symbol.
Until now, symbols that represent subfields and predicates were both
implemented as the same string member, named 'expansion', inside struct
expr. This makes it a little inconvenient to find the parent of a subfield
for two reasons. First, one must actually parse the string, e.g. to
convert "vlan.tci[13..15]" into a pointer to a struct. Second, and more
importantly, to parse the string it's necessary to have access to the
symbol table, which isn't always convenient to pass around. This commit
avoids the problem by breaking apart subfields and predicates and giving
the former a direct pointer to the parent symbol.
We could do the same thing for predicates by storing a pointer to a
pre-built struct expr, but so far it's not necessary.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Fri, 15 Jul 2016 21:27:55 +0000 (14:27 -0700)]
expr: Track writability as part of expr_symbol.
Until now it was only possible to find out whether an expr_symbol was
read/write or read-only, for subfields, by chasing down whether the
eventual parent field was read/write or read-only. This commit adds
a new 'rw' member that indicates directly.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Wed, 3 Aug 2016 05:46:18 +0000 (22:46 -0700)]
expr: Initialize 'relop' of allocated exprs in crush_and_string().
Every relop at this point is always EXPR_R_EQ, and therefore it seems that
no code actually examined it, so this doesn't appear to fix an existing
bug, but some code I was working on was affected by the uninitialized
member.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Wed, 3 Aug 2016 04:53:59 +0000 (21:53 -0700)]
expr: Refine handling of error parameter to expr_annotate().
In most cases expr_annotate() set '*errorp' to NULL if it was successful,
but there was one case where it did not. This corrects that and refines
the comment to better explain the intended behavior.
This didn't affect any existing users because all of them passed in a
pointer that was already NULL.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Sun, 31 Jul 2016 17:18:40 +0000 (10:18 -0700)]
expr: Fine-tune parser error message for common typo.
It's easy to type "=" in place of "==" in an expression but the expression
parser's error message was far from clear. For multibit numeric fields,
it said:
Explicit `!= 0' is required for inequality test of multibit field
against 0.
For string fields, the parser treated such an expression as "<name> != 0"
and thus it said:
String field <name> is not compatible with numeric constant.
This improves the error message in each case to:
Syntax error at `=' expecting relational operator.
which I hope to be clear.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Fri, 15 Jul 2016 21:13:02 +0000 (14:13 -0700)]
ofp-actions: Correct member name for write_actions.
For a variable-length action like write_actions, the member name is
supposed to be the name of the variable-length array at the end of the
action structure. It only makes a real difference if the beginning of the
array is not 64-bit aligned, so it did not matter in this case, but it's
better to get it right.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Wed, 27 Jul 2016 06:55:25 +0000 (23:55 -0700)]
ovsdb-idl: Wake up ovsdb_idl_loop when a transaction commits.
There is a fair amount of code that defers modifying the database when a
transaction cannot be created (because there is already one outstanding).
This code tends to assume that the main loop will wake up again when it
becomes possible again to modify the database, but the actual ovsdb_id_loop
implementation only did this if the database had changed. This is too
conservative a policy and may account for some failures I've seen in tests.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Ben Pfaff [Mon, 8 Aug 2016 03:44:51 +0000 (20:44 -0700)]
ovn-nbctl: Add "sync" command to wait for previous changes to take effect.
It's slow to add --wait to every ovn-nbctl command; only the last command
needs it. But it's sometimes inconvenient to add it to the last command
if it's in a loop, etc. This makes it possible to separately wait for
the OVN southbound or hypervisors to catch up to the northbound.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
The '-d' flag tells autotest to always keep the testcase output, but
prevents '--recheck' from working. If a user wants to always keep the
output from the tests, the '-d' flag can be passed explicitly. This is
more in line with other test make target ('check',
'check-system-userspace').
CC: Andy Zhou <azhou@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Andy Zhou <azhou@ovn.org>
system-traffic: Flush conntrack after debug ping6.
We want to discard any state created by the initial ping6 (used to wait
for an available IP address). Otherwise some weird state can show up in
the connection tracking tables (such as ICMP connection from link-local
addresses).
Fixes: e5cf8cce2759("system-tests: Add ping through conntrack test.") Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
system-userspace-macros: Check the exit code of ethtool.
If the ethtool command is not available on the system we should fail,
since the userspace testsuite cannot work properly without disabling
offloads.
Also, add ethtool to the list of installed packages on Vagrantfile, to
ensure that offloads don't cause test failures in the vagrant VM when
the kernel is updated.
Fixes: ddcf96d2dcc1 ("system-tests: Disable offloads in userspace tests.") Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
This patch adds some comments to the dpcls_lookup() funtion,
which is one of the most important places where the Userspace
wildcard matching happens.
The purpose is to give some more explanations on its design
and also on how it works.
Joe Stringer [Fri, 5 Aug 2016 00:40:43 +0000 (17:40 -0700)]
system-traffic: Make ping6 vlan test more reliable.
Previously we checked on the underlying interfaces rather than the vlan
interfaces to verify whether IPv6 connectivity is available;
occasionally this would fail on some systems. Wait on the VLAN IP
instead.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: <diproiettod@vmware.com>
Maxime Coquelin [Tue, 2 Aug 2016 13:48:27 +0000 (15:48 +0200)]
bridge: No QoS configured is not an error
If no QoS is configured, type value is likely to be an empty
string.
This is not an error though, so use the regular command reply
function, not the error one.
For example, before this patch:
# ovs-appctl -t ovs-vswitchd qos/show vhost-user1
QoS not configured on vhost-user1
ovs-appctl: ovs-vswitchd: server returned an error
After the patch:
# ovs-appctl -t ovs-vswitchd qos/show vhost-user1
QoS not configured on vhost-user1
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>