When generating conditional monitoring update request, current code
failed to update idl's 'request-id'. This bug causes the reply
message of the update request, regardless an ACK or a NACK, be
logged as an unexpected message at the debug level and ignored by
the core idl logic.
In addition, the idl should not generate another conditional
monitoring update request when there is an outstanding request.
So that the requests and their reply are properly serialized.
When the conditional monitoring is nacked by the server, drop idl
into a client visible error state.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
dpif-netdev: Uses the OVS_CORE_UNSPEC instead of magic numbers.
This patch uses OVS_CORE_UNSPEC for the queue unpinned instead
of "-1". More important, the "-1" casted to unsigned int is
equal to NON_PMD_CORE_ID. We make the distinction between them.
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Jarno Rajahalme [Fri, 6 Jan 2017 01:30:27 +0000 (17:30 -0800)]
nx-match: Only store significant bytes to stack.
Always storing the maximum mf_value size wastes about 120 bytes for
each stack entry. This patch changes the stack from an mf_value array
to a string of value-length pairs.
The length is stored after the value so that the stack pop may first
read the length and then the appropriate number of bytes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
John Hurley [Sat, 7 Jan 2017 01:55:11 +0000 (17:55 -0800)]
datapath: Ensure correct L4 checksum with NAT helpers.
Setting the CHECKSUM_PARTIAL flag before sending to helper mods was
missing the checksum update call ('csum_*_magic()'), which caused
checksum failures with kernels <4.6. This can mean that the L4
checksum is incorrect when the packet egresses the system.
Rather than adding the missing (IP version dependent) calls, give the
packet a temp skb_dst with RTCF_LOCAL flag not set, which ensures the
skb is properly changed to CHECKSUM_PARTIAL if required and the
modified packet will get the correct checksum when fully processed.
This has tested with FTP NAT helpers on kernel version 3.13.
Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
dpif: Return ENODEV from dpif_port_query_by_*() if there's no port.
bridge_delete_or_reconfigure() deletes every interface that's not dumped
by OFPROTO_PORT_FOR_EACH(). ofproto_dpif.c:port_dump_next(), used by
OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
calling port_query_by_name(). If port_query_by_name() returns an error,
the dump is interrupted. If port_query_by_name() returns ENODEV, the
device doesn't exist and the dump can continue.
port_query_by_name() for the userspace datapath returns ENOENT instead
of ENODEV. This is expected by dpif_port_query_by_name(), but it's not
handled correctly by port_dump_next().
dpif-netdev handles reconfiguration errors for an interface by deleting
it from the datapath, so it's possible that a device is missing. When this
happens we must make sure that port_dump_next() continues to dump other
devices, otherwise they will be deleted and the two layers will have an
inconsistent view.
This commit fixes the problem by returning ENODEV from the userspace
datapath if the port doesn't exist, and by documenting this clearly in
the dpif interfaces.
The problem was found while developing new code.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
In case connection is reset when there are buffered but unsent
conditions, these conditions will be sent as the new "monitor_cond"
message that will be sent after the idl reconnects.
Without this patch, those conditions will be unnecessarily sent again
with following monitoring condition update message.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Fri, 6 Jan 2017 02:09:35 +0000 (18:09 -0800)]
python: Fix nroff indentation for <dl> after <hN>.
When XML is used for writing manpages, in the case that there is a
header tag followed by <dl>, the nroff python utility indents the <dl>
tag (and children) an extra level which is unnecessary and makes the
formatting inconsistent between manpages written directly in nroff vs
manpages written in XML and converted to nroff. Fix the indentation by
removing the extraneous .RS / .RE tags added to generated nroff in this
case.
This fixes the formatting of ovn/utilities/ovn-nbctl.8 man page.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Mickey Spiegel [Sun, 1 Jan 2017 01:05:21 +0000 (17:05 -0800)]
ofproto-dpif-xlate: After thawing, retrieve tunnel table from thawed xbridge
In xlate_actions in ofproto-dpif-xlate.c, after thawing from frozen state,
it currently retrieves the tunnel metadata table from the original xbridge.
It should retrieve the tunnel metadata table from the thawed xbridge.
In OVN, this manifested as missing geneve option fields when receiving a
packet from localnet to br-int, then freezing (e.g. for NAT on a gateway
router or for distributed NAT), then attempting to send out a tunnel.
Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
zhaojingjing [Fri, 6 Jan 2017 07:55:40 +0000 (15:55 +0800)]
ovn-nbctl: Fix documentation for "ovn-nbctl acl-add".
The range of "PRIORITY" for "ovn-nbctl acl-add " command is 1 to 65534 in
ovn-nbctl.8.xml",When configuring this command, it indicates that "
priority must in range 0...32767".The range of priority is inconsistent
in "ovn-nbctl.8.xml" and "ovn-nbctl.c".
Signed-off-by: zhaojingjing <zhao.jingjing1@zte.com.cn> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 6 Jan 2017 01:03:05 +0000 (17:03 -0800)]
ofctrl: Fix version check in ofctrl_inject_packet().
"enum ofp_version" is unsigned in the System V ABI used by Linux, so
it will never be less than 0, so an rconn with an unnegotiated version will
never be found properly. This fixes the problem.
Ciara Loftus [Thu, 5 Jan 2017 10:42:10 +0000 (10:42 +0000)]
netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)
Prior to this commit, the 'dpdk' port type could only be used for
physical DPDK devices. Now, virtual devices (or 'vdevs') are supported.
'vdev' devices are those which use virtual DPDK Poll Mode Drivers eg.
null, pcap. To add a DPDK vdev, a valid 'dpdk-devargs' must be set for
the given dpdk port. The format expected is 'eth_<driver_name><x>' where
'x' is a number between 0 and RTE_MAX_ETHPORTS -1.
For example to add a port that uses the 'null' DPDK PMD driver:
ovs-vsctl set Interface null0 options:dpdk-devargs=eth_null0
Not all DPDK vdevs have been verified to work at this point in time.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Stephen Finucane <stephen@that.guru> # docs only Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Ciara Loftus [Thu, 5 Jan 2017 10:42:09 +0000 (10:42 +0000)]
netdev-dpdk: Arbitrary 'dpdk' port naming
'dpdk' ports no longer have naming restrictions. Now, instead of
specifying the dpdk port ID as part of the name, the PCI address of the
device must be specified via the 'dpdk-devargs' option. eg.
ovs-vsctl add-port br0 my-port
ovs-vsctl set Interface my-port type=dpdk
options:dpdk-devargs=0000:06:00.3
The user must no longer hotplug attach DPDK ports by issuing the
specific ovs-appctl netdev-dpdk/attach command. The hotplug is now
automatically invoked when a valid PCI address is set in the
dpdk-devargs. The format for ovs-appctl netdev-dpdk/detach command
has changed in that the user now must specify the relevant PCI address
as input instead of the port name.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Co-authored-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Stephen Finucane <stephen@that.guru> # docs only Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
In order to use dpdk ports in ovs they have to be bound to a DPDK
compatible driver before ovs is started.
This patch adds the possibility to hotplug (or hot-unplug) a device
after ovs has been started. The implementation adds two appctl commands:
netdev-dpdk/attach and netdev-dpdk/detach
After the user attaches a new device, it has to be added to a bridge
using the add-port command, similarly, before detaching a device,
it has to be removed using the del-port command.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Co-authored-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Stephen Finucane <stephen@that.guru> # docs only Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Wed, 4 Jan 2017 21:58:00 +0000 (13:58 -0800)]
docs: Fix formatting of patch comments line.
Sphinx was formatting the `---` as an extended dash, not verbatim as
three hyphens (which is what is necessary for git to determine that it's
a comment, and ignore it when applying the patch).
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Stephen Finucane <stephen@that.guru>
Due to upstream Linux feature "automatic helper assignment", up until
recently when using ct() action with FTP traffic, it has not been
necessary to specify the ALG parameter. However, automatic helper
assignment was disabled in Linux 4.7 or later, in upstream commit 3bb398d925ec ("netfilter: nf_ct_helper: disable automatic helper
assignment"). Document the need for this.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Ben Pfaff [Tue, 6 Dec 2016 22:11:15 +0000 (14:11 -0800)]
ofproto-dpif: Break trace functionality into a separate source file.
An upcoming commit will rewrite much of the ofproto/trace functionality.
As the mechanism behind this grows and evolves, it makes sense to move it
into its own file, especially since ofproto-dpif.c is too big anyway.
Ben Pfaff [Tue, 6 Dec 2016 22:08:42 +0000 (14:08 -0800)]
ofproto-dpif: Unhide structure contents.
Until now, ofproto-dpif.c has hidden the definitions of several structures,
such as struct ofproto_dpif and struct rule_dpif. This kind of information
hiding is often beneficial, because it forces code outside the file with
the definition to use the documented interfaces. However, in this case it
was starting to burden ofproto-dpif with an increasing number of trivial
helpers that were not improving or maintaining a useful abstraction and
that were making code harder to maintain and read.
Information hiding also made it hard to move blocks of code outside
ofproto-dpif.c itself, since any code moved out often needed new helpers if
it used anything that wasn't previously exposed. In the present instance,
upcoming patches will move code for tracing outside ofproto-dpif, and this
would require adding several helpers that would just obscure the function
of the code otherwise needlessly.
In balance, it seems that there is more harm than good in the information
hiding here, so this commit moves the definitions of several structures
from ofproto-dpif.c into ofproto-dpif.h. It also removes all of the
trivial helpers that had accumulated, instead changing their users to
directly access the members that they needed. It also reorganizes
ofproto-dpif.h, grouping structure definitions and function prototypes in a
sensible way.
Justin Pettit [Thu, 5 Jan 2017 01:50:39 +0000 (17:50 -0800)]
ovn.at: Rewrite a test using ovn-controller 'inject-pkt' command.
Provide an example of using ovn-controller 'inject-pkt' and ovn-test
'expr-to-packets' commands to generate and verify proper handling of
packets. Tests written in this way should be easier to understand than
raw packets written in hexadecimal.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Add the ability to inject a packet into the connected Open vSwitch
instance. This is primarily useful for testing when a test requires
side-effects from an actual packet, so ovn-trace won't do.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Lukasz Rzasik [Thu, 29 Dec 2016 22:55:46 +0000 (15:55 -0700)]
ovsdb-data: Add support for integer ranges in database commands
Adding / removing a range of integers to a column accepting a set of
integers requires enumarating all of the integers. This patch simplifies
it by introducing 'range' concept to the database commands. Two integers
separated by a hyphen represent an inclusive range.
The patch adds positive and negative tests for the new syntax.
The patch was tested by 'make check'. Covarage was tested by
'make check-lcov'.
Signed-off-by: Lukasz Rzasik <lukasz.rzasik@gmail.com> Suggested-by: <my_ovs_discuss@yahoo.com> Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Add support for SSL connections to OVN northbound and/or
southbound databases.
To improve security, the NB and SB ovsdb daemons no longer
have open ptcp connections by default. This is a change in
behavior from previous versions, users wishing to use TCP
connections to the NB/SB daemons can either request that
a passive TCP connection be used via ovn-ctl command-line
options (e.g. via OVN_CTL_OPTS/OVN_NORTHD_OPTS in startup
scripts):
Users desiring SSL database connections will need to generate certificates
and private key as described in INSTALL.SSL.rst and perform the following
one-time configuration steps:
On the ovn-controller and ovn-controller-vtep side, SSL configuration
must be provided on the command-line when the daemons are started, this
should be provided via the following command-line options (e.g. via
OVN_CTL_OPTS/OVN_CONTROLLER_OPTS in startup scripts):
ofproto: Fix crash on flow monitor request with tun_metadata.
nx_put_match() needs a non-NULL tunnel metadata table, otherwise it will
crash if a flow matches on tunnel metadata.
This wasn't handled in ofputil_append_flow_update(), causing a crash
when the controller sent a flow monitor request.
To fix the problem, this commit changes ofputil_append_flow_update() to
behave like ofputil_append_flow_stats_reply().
Since ofputil_append_flow_update() now needs to temporarily modify the
match, this commits also embeds 'struct match' into 'struct
ofputil_flow_update', to be safer. This is more similar to
'struct ofputil_flow_stats'.
A regression test is added and a comment is updated in ovs-ofctl.c
#0 0x000055699bd82fa0 in memcpy_from_metadata (dst=0x7ffc770930d0, src=0x7ffc77093698, loc=0x18) at ../lib/tun-metadata.c:451
#1 0x000055699bd83c2e in metadata_loc_from_match_read (map=0x0, match=0x7ffc77093410, idx=0, mask=0x7ffc77093658, is_masked=0x7ffc77093287) at ../lib/tun-metadata.c:848
#2 0x000055699bd83d9b in tun_metadata_to_nx_match (b=0x55699d3f0300, oxm=0, match=0x7ffc77093410) at ../lib/tun-metadata.c:871
#3 0x000055699bce523d in nx_put_raw (b=0x55699d3f0300, oxm=0, match=0x7ffc77093410, cookie=0, cookie_mask=0) at ../lib/nx-match.c:1052
#4 0x000055699bce5580 in nx_put_match (b=0x55699d3f0300, match=0x7ffc77093410, cookie=0, cookie_mask=0) at ../lib/nx-match.c:1116
#5 0x000055699bd3926f in ofputil_append_flow_update (update=0x7ffc770940b0, replies=0x7ffc77094e00) at ../lib/ofp-util.c:6805
#6 0x000055699bc4b5a9 in ofproto_compose_flow_refresh_update (rule=0x55699d405b40, flags=(NXFMF_INITIAL | NXFMF_ACTIONS), msgs=0x7ffc77094e00) at ../ofproto/ofproto.c:5915
#7 0x000055699bc4b5f6 in ofmonitor_compose_refresh_updates (rules=0x7ffc77094e10, msgs=0x7ffc77094e00) at ../ofproto/ofproto.c:5929
#8 0x000055699bc4bafc in handle_flow_monitor_request (ofconn=0x55699d404090, oh=0x55699d404220) at ../ofproto/ofproto.c:6082
#9 0x000055699bc4f46d in handle_openflow__ (ofconn=0x55699d404090, msg=0x55699d404910) at ../ofproto/ofproto.c:7912
#10 0x000055699bc4f5df in handle_openflow (ofconn=0x55699d404090, ofp_msg=0x55699d404910) at ../ofproto/ofproto.c:8002
#11 0x000055699bc88154 in ofconn_run (ofconn=0x55699d404090, handle_openflow=0x55699bc4f5bc <handle_openflow>) at ../ofproto/connmgr.c:1427
#12 0x000055699bc85934 in connmgr_run (mgr=0x55699d3adb90, handle_openflow=0x55699bc4f5bc <handle_openflow>) at ../ofproto/connmgr.c:363
#13 0x000055699bc422c9 in ofproto_run (p=0x55699d3c85e0) at ../ofproto/ofproto.c:1798
#14 0x000055699bc31ec6 in bridge_run__ () at ../vswitchd/bridge.c:2881
#15 0x000055699bc320a6 in bridge_run () at ../vswitchd/bridge.c:2938
#16 0x000055699bc3784e in main (argc=10, argv=0x7ffc770952c8) at ../vswitchd/ovs-vswitchd.c:111
Fixes: 8d8ab6c2d574 ("tun-metadata: Manage tunnel TLV mapping table on a
per-bridge basis.")
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
Kevin Traynor [Tue, 3 Jan 2017 18:21:28 +0000 (18:21 +0000)]
doc: Remove ivshmem instructions.
ivshmem is a path to the guest using DPDK rings that was
introduced before userspace vhost was available in the OVS-DPDK
datapath. ivshmem is external to OVS but the scheme of using it
with DPDK rings is documented.
Remove ivshmem instruction documentation because:
- The ivshmem library has been removed in DPDK since DPDK 16.11.
- The instructions/scheme provided will not work with current
supported and future DPDK versions.
- The linked patch needed to enable support in QEMU has never
been upstreamed and does not apply to the last 4 QEMU releases.
- Userspace vhost has become the defacto OVS-DPDK path to the guest.
Fixes: 04de404e1bfa ("netdev-dpdk: Add support for DPDK 16.11") Cc: Ciara Loftus <ciara.loftus@intel.com> Cc: Stephen Finucane <stephen@that.guru> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Stephen Finucane <stephen@that.guru> Acked-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Jarno Rajahalme [Thu, 5 Jan 2017 00:10:56 +0000 (16:10 -0800)]
odp: Use struct in6_addr for IPv6 addresses.
Code is simplified when the ODP keys use the same type as the struct
flow for the IPv6 addresses. As the change is facilitated by
extract-odp-netlink-h, this change only affects the userspace. We
already do the same for the ethernet addresses.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Jarno Rajahalme [Thu, 5 Jan 2017 00:10:56 +0000 (16:10 -0800)]
ofp-parse: Allow match field names in actions and brackets in matches.
Allow using match field names in addition to the canonical register
names in actions (including 'load', 'move', 'push', 'pop', 'output',
'multipath', 'bundle_load', and 'learn'). Allow also leaving out the
trailing '[]' to indicate full field. These changes allow simpler
syntax similar to 'set_field' to be used also elsewhere.
Correspondingly, allow the '[start..end]' syntax to be used in matches
in addition to the more explicit 'value/mask' notation. For example,
to match on the value 2 of the bits 14..15 of NXM_NX_REG0, the match
could include:
... reg0[14..15]=2 ...
instead of
... reg0=0x8000/0xc000 ...
Note that only contiguous masks can be specified with the bracket
notation.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Enable operations (including "list") on DHCP_Options and DHCPv6_Options
tables via ovn-sbctl. These are currently the only OVN_Southbound
tables that ovn-sbctl does not support.
Example:
$ ovn-sbctl -f table list DHCPv6_Options
_uuid code name type
------------------------------------ ---- ------------- ------ 8646bb15-5e88-4432-a21a-4e22a2976482 23 dns_server "ipv6" 564e98e9-ee23-447b-a7c5-c36ca05059fa 24 domain_search str 8c6cb059-5bb5-4ef8-960b-f002c769589e 2 server_id mac 525e8fc6-7921-48eb-8bd3-fe5cb5dd0142 5 ia_addr "ipv6"
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
There was a bug when using hacking with flake8 3.x. This bug has since
been resolved [1], meaning we no longer need to call out the need to use
the older version of flake8.
[1] https://review.openstack.org/#/c/335965/
Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ben Pfaff <blp@ovn.org>
Alin Serdean [Wed, 28 Dec 2016 22:27:17 +0000 (22:27 +0000)]
ovs-thread: Avoid pthread_rwlockattr_t on Windows.
A recent commit fixed ovs_rwlock_init() to pass the pthread_rwlockattr_t
that it initialized to pthread_rwlock_init(). According to POSIX
documentation this is correct, but on Windows the current implementation of
pthreads does not support a pre-initialized attribute. Please see a fork
of the implementation
https://github.com/GerHobbelt/pthread-win32/blob/19fd5054b29af1b4e3b3278bfffbb6274c6c89f5/pthread_rwlock_init.c#L59-L63
This is the same implementation as the official version found under:
ftp://sourceware.org/pub/pthreads-win32/)
A short debug output from `vswitch` to confirm the above:
This patch is critical because the majority (over 800) of the unit tests
are failing.
Fixes: 1a15f390afd6 ("lib/ovs-thread: set prefer writer lock for ovs_rwlock_init()") Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Acked-by: Shashank Ram <rams@vmware.com>
[blp@ovn.org changed the details of the approach] Signed-off-by: Ben Pfaff <blp@ovn.org>
The recently published 'ovs' theme [1] copies the styling of the Open
vSwitch website. Start using this, with fallbacks for users who do not
have the package installed.
This extends support for building docs to users of Sphinx 1.2 as the
previous theme - bizstyle - was only available in 1.3+.
[1] https://pypi.python.org/pypi/ovs-sphinx-theme
Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ben Pfaff <blp@ovn.org>
Sugesh Chandran [Mon, 2 Jan 2017 22:27:48 +0000 (14:27 -0800)]
netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.
Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.
The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,
OR (to disable at run time after port is being added to OVS)
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
Similarly to turn ON rx checksum offloading at run time,
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
The Tx checksum offloading support is not implemented due to the following
reasons.
1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.
2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,
*) Mark every packets for tx checksum offloading at 'tunnel_push' and
recirculate.
*) At the time of xmit, validate the same flag and instruct the NIC to do the
checksum calculation. In case NIC doesnt support Tx checksum offloading,
the checksum calculation has to be done in software before sending out the
packets.
No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.
Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.
Alin Balutoiu [Tue, 3 Jan 2017 20:10:53 +0000 (20:10 +0000)]
Python tests: Set CREATE_NO_WINDOW flag for Popen
On Windows if the flag CREATE_NO_WINDOW is not
specified when using subprocess.Popen, a new
window will appear with the new process.
The window is not necessary for the tests.
This patch addresses this issue by adding
the flag CREATE_NO_WINDOW for all subprocess.Popen
calls if the machine is running Windows.
Signed-off-by: Alin Balutoiu <abalutoiu@cloudbasesolutions.com> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions> Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Alin Balutoiu [Tue, 3 Jan 2017 20:10:52 +0000 (20:10 +0000)]
Python tests: Daemon ported to Windows
Instead of using os.fork (not supported on Windows),
subprocess.Popen is used and os.pipe was replaced
with Windows pipes.
To be able to identify the child process, an extra
parameter was added to daemon process '--pipe-handle'.
This parameter contains the parent Windows pipe handle
which is used by the child to notify the parent about
the startup.
The PID file is created directly on Windows, without
using a temporary file because the symbolic link does
not inherit the file lok set on the temporary file.
Alin Balutoiu [Tue, 3 Jan 2017 20:10:51 +0000 (20:10 +0000)]
Python tests: Ported UNIX sockets to Windows
Unix sockets (AF_UNIX) are not supported on Windows.
The replacement of Unix sockets on Windows is implemented
using named pipes, we are trying to mimic the behaviour
of unix sockets.
Instead of using Unix sockets to communicate
between components Named Pipes are used. This
makes the python sockets compatible with the
Named Pipe used in Windows applications.
Joe Stringer [Thu, 22 Dec 2016 18:58:26 +0000 (10:58 -0800)]
test-l7.py: Tidy up and python3-ify.
Haul test-l7.py into the 202nd decade by supporting python3.
TFTPY still doesn't support python3, so work around this by handling
import syntax errors so that even if tftpy is installed in a python3
environment, test-l7.py will not throw an exception while attempting to
load it.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Alin Serdean [Thu, 29 Dec 2016 00:25:34 +0000 (00:25 +0000)]
datapath-windows: Conntrack disable type truncation warning
Compiling with the WDK 10 gave the following warning:
Warning C4311 'type cast': pointer truncation from 'POVS_CT_ENTRY' to 'UINT32'
ovsext (OVSExt\ovsext) Conntrack.c 1139
This patch disables the warning on the file Conntrack.c.
Guoshuai Li [Thu, 29 Dec 2016 15:47:58 +0000 (23:47 +0800)]
OVN-HA: Fix data loss after OVNDB promotion
When master node shuts down, both VIP and OVNDB Master are expected
to be moved over to the backup node.
However, the VIP must be started after the OVNDB has been promoted.
Otherwise, the database content can be whipped out, since the OVSDB
running in the backup state can reconnect to the VIP that just moved
over, thus removing the database content.
See also: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/
Pacemaker_Explained/s-resource-ordering.html
Signed-off-by: Guoshuai Li <ligs@dtdream.com> Signed-off-by: Andy Zhou <azhou@ovn.org>
Ben Pfaff [Wed, 28 Dec 2016 17:31:42 +0000 (09:31 -0800)]
ovn-trace: New --ovs option to also print OpenFlow flows.
Sometimes seeing the OpenFlow flows that back a given logical flow can
provide additional insight. This commit adds a new --ovs option to
ovn-trace that makes it connect to Open vSwitch over OpenFlow and retrieve
and print the OpenFlow flows behind each logical flow encountered during
a trace.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
conntrack: Do not create new connections from ICMP errors.
ICMP error packets (e.g. destination unreachable messages) are
considered 'related' to another connection and are treated as part of
that.
However:
* We shouldn't create new entries in the connection table if the
original connection is not found. This is consistent with what the
kernel does.
* We certainly shouldn't call valid_new() on the packet, because
valid_new() assumes the packet l4 type (might be TCP, UDP or ICMP)
to be consistent with the conn_key nw_proto type.
Found by inspection.
Fixes: a489b16854b5("conntrack: New userspace connection tracker.") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Darrell Ball <dlu998@gmail.com>
Ben Pfaff [Fri, 23 Dec 2016 21:45:00 +0000 (13:45 -0800)]
debian: Also restrict ovn-docker package to Linux.
The Debian packages for OVS have only supported Linux so far, but the
ovn-docker package was mistakenly marked as Architecture: any instead
of linux-any, which caused build failures. This fixes the problem.
(Perhaps OVS packaging for Debian should also support BSD, but that
would be a bigger change.)
Reported-at: https://buildd.debian.org/status/fetch.php?pkg=openvswitch&arch=kfreebsd-amd64&ver=2.6.2%7Epre%2Bgit20161223-1&stamp=1482518318&file=log Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Fri, 23 Dec 2016 17:23:43 +0000 (09:23 -0800)]
rconn: Avoid abort for ill-behaved remote.
If an rconn peer fails to send a hello message, the version number doesn't
get set. Later, if the peer delays long enough, the rconn attempts to send
an echo request but assert-fails instead because it doesn't know what
version to use. This fixes the problem.
The debian packages are ready. This patch fixes the
bug #831924 reported at debian bug tracking system.
With this patch, openvswitch-2.6.1 will be upload to
the Debian archive. If we build the packages with
"dpkg-buildpackage --target binary-indep", an error
state arises. debian/rules should be modified so that
the build-indep and binary-indep target generates
the architecture independent packages. If there are
things not be handled properly,let me know.
Reported-at: https://people.debian.org/~lucas/logs/2016/07/20/openvswitch_2.5.1~pre+git20160626-2_unstable_archallonly.log Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Signed-off-by: Ben Pfaff <blp@ovn.org>
Stephen Finucane [Fri, 23 Dec 2016 11:46:29 +0000 (11:46 +0000)]
doc: Correct type of highlighting
Some recent changes marked code as Powershell when in fact it was DOS or
bash shell. This incorrect highlighting actually breaks the local build
(where warnings are treated as errors) as pygments is unable to lex all
the code as PowerShell. Fix these types.
Signed-off-by: Stephen Finucane <stephen@that.guru> Fixes: b8d24cc8a ("doc: Misc Windows doc formatting fixes") Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Wed, 21 Dec 2016 17:15:59 +0000 (09:15 -0800)]
lacp: Select a may-enable IF as the lead IF
A reboot of one switch in an MC-LAG bond makes all bond links
to go down, causing a total connectivity loss for 3 seconds.
Packet capture shows that spurious LACP PDUs are sent to OVS with
a different MAC address (partner system id) during the final
stages of the MC-LAG switch reboot.
The current code selects a lead interface based on information
in the LACP PDU, regardless of its synchronization state. If a
non-synchronized interface is selected as the OVS lead interface
then all other interfaces are forced down as their stored partner
system id differs and the bond ends up with no working interface.
The bond recovers within three seconds after the last spurious
message.
To avoid the problem, this commit requires a lead interface
to be synchronized. In case no synchronized interface exists,
the selection of lead interface is done as in the current code.
Signed-off-by: Torgny Lindberg <torgny.lindberg@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Zoltán Balogh [Tue, 13 Dec 2016 17:27:37 +0000 (17:27 +0000)]
odp-execute: Optimize IP header modification in OVS datapath
I measured the packet processing cost of OVS DPDK datapath for different
OpenFlow actions. I configured OVS to use a single pmd thread and
measured the packet throughput in a phy-to-phy setup. I used 10G
interfaces bounded to DPDK driver and overloaded the vSwitch with 64
byte packets through one of the 10G interfaces.
The processing cost of the dec_ttl action seemed to be gratuitously high
compared with other actions.
I looked into the code and saw that dec_ttl is encoded as a masked
nested attribute in OVS_ACTION_ATTR_SET_MASKED(OVS_KEY_ATTR_IPV4).
That way, OVS datapath can modify several IP header fields (TTL, TOS,
source and destination IP addresses) by a single invocation of
packet_set_ipv4() in the odp_set_ipv4() function in the
lib/odp-execute.c file. The packet_set_ipv4() function takes the new
TOS, TTL and IP addresses as arguments, compares them with the actual
ones and updates the fields if needed. This means, that even if only TTL
needs to be updated, each of the four IP header fields is passed to the
callee and is compared to the actual field for each packet.
The odp_set_ipv4() caller function possesses information about the
fields that need to be updated in the 'mask' structure. The idea is to
spare invocation of the packet_set_ipv4() function but use its code
parts directly. So the 'mask' can be used to decide which IP header
fields need to be updated. In addition, a faster packet processing can
be achieved if the values of local variables are
calculated right before their usage.
| T | T | I | I |
| T | O | P | P | Vanilla OVS || + new patch
| L | S | s | d | (nsec/packet) || (nsec/packet)
-------+---+---+---+---+---------------++---------------
output | | | | | 67.19 || 67.19
| X | | | | 74.48 || 68.78
| | X | | | 74.42 || 70.07
| | | X | | 84.62 || 78.03
| | | | X | 84.25 || 77.94
| | | X | X | 97.46 || 91.86
| X | | X | X | 100.42 || 96.00
| X | X | X | X | 102.80 || 100.73
The table shows the average processing cost of packets in nanoseconds
for the following actions:
output; output + dec_ttl; output + mod_nw_tos; output + mod_nw_src;
output + mod_nw_dst and some of their combinations.
I ran each test five times. The values are the mean of the readings
obtained.
I added OVS_LIKELY to the 'if' condition for the TTL field, since as far
as I know, this field will typically be decremented when any field of
the IP header is modified.
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
zangchuanqiang [Fri, 16 Dec 2016 02:28:11 +0000 (10:28 +0800)]
lib/ovs-thread: set prefer writer lock for ovs_rwlock_init()
An alternative "writer nonrecursive" rwlock allows recursive
read-locks to succeed only if there are no threads waiting for the
write-lock. In the function ovs_rwlock_init(), there exist a problem,
the parameter of 'attr' is not used to set the attributes of ovs_rwlock 'l_',
just because use pthread_rwlock_init(&l->lock, NULL) to init l->lock.
The attr object needs to be passed to the pthread_rwlock_init()
call in order to make use of it.
Signed-off-by: zangchuanqiang <zangchuanqiang@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Lance Richardson [Thu, 22 Dec 2016 19:45:50 +0000 (14:45 -0500)]
table: correct documented default format in man pages
There are currently five users of the table formatting library,
all of which default to "list" except for ovsdb-client which
defaults to "table". The library current default is "table",
and the table.man man page fragment only considers ovs-vsctl
to use something other than "table" as a default.As a result,
the man pages for ovn-sbctl and vtep-ctl are currently incorrect
(these options aren't documented in the ovn-nbctl man page, which
will need to be addressed in a future patch).
Fix by making the library default format "list" and handling
ovsdb-client as the exception.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Tue, 20 Dec 2016 21:28:25 +0000 (13:28 -0800)]
system-traffic: Introduce OVS_START_L7 macro.
All of the commands starting L7 servers duplicate detailed specifics
which inhibits readability, and makes it difficult to ensure that the
servers are ready before the test proceeds. Add a new macro that
provides simpler semantics from the test perspective and hide the
details in the macro. A followup patch will extend this macro to ensure
that servers are ready to serve requests before the test proceeds.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When IGMP or MLD packets arrive their content is used without the checksum
being verified. With this change the checksum is verified, and the packet
is not used for multicast snooping on failure.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
route-table: Stop netlink log message when routes withdrawn
When a route is withdrawn (blackholed) the netlink message doesn't include
an RTA_OIF element. This results in an "unexpected netlink message
contents" log message because this element is not optional.
Given that the netlink message will be ignored anyway, and subsequent
error checking will cope with missing RTA_OIF, the element should be
optional in order to suppress unnecessary log messages.
Signed-off-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz> Signed-off-by: Ben Pfaff <blp@ovn.org>