Han Zhou [Thu, 21 Mar 2019 23:06:30 +0000 (16:06 -0700)]
ovn-ctl: Unify OVN_RUNDIR usage.
In this script $rundir and $OVN_RUNDIR is used in a mixed way, which
can cause different folders used for different runtime files. This
patch unifies the usage to the correct one.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Timothy Redaelli [Fri, 22 Mar 2019 18:45:46 +0000 (19:45 +0100)]
rhel: Fix sphinx BuildRequires on Fedora Rawhide
On Fedora Rawhide only python3-sphinx is available, but currently
python2-sphinx is used.
This commit changes the BuildRequires for sphinx to use
/usr/bin/sphinx-build directly instead of python2-sphinx in order to make
it work on current Fedora Rawhide too.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Thu, 21 Mar 2019 10:56:47 +0000 (13:56 +0300)]
ovs-vsctl: Add datapath_type column to show command.
Sometimes it's unclear which datapath type is in use by particular
bridge. For example, if all the interfaces supported by both system
and netdev datapaths it needs a DB query or log analysis to find out
which 'datapath_type' is in use.
Another case is that it's hard to figure out if patch ports are really
connected to each other. They are definitely not connected if datapath
types of their bridges differs.
With this change non-default 'datapath_type's will be exposed to
'ovs-vsctl show' command, so it'll be easier to spot misconfiguration.
$ ovs-vsctl show
...
Bridge "br0"
datapath_type: netdev
Port "br0"
Interface "br0"
type: internal
...
Han Zhou [Fri, 22 Mar 2019 20:41:05 +0000 (13:41 -0700)]
reconnect.c: Don't transition back to ACTIVE when forced to RECONNECT.
Currently, whenever there is activity on the session, the FSM is
transitioned to ACTIVE. However, this causes reconnect_force_reconnect()
failed to work once there are traffic received from remote after
transition to RECONNECT, it will skip the reconnection phase and directly
go back to ACTIVE for the old session. This patch fixes it so that
when FSM is in RECONNECT state, it doesn't transition back to ACTIVE
directly.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The ifupdown.sh script passes the --may-exist option
to ovs-vsctl invocations in order for it to exit without failing
if the device to be added already exists. This holds true for
all cases of adding objects to ovs-vswitchd except for when
configuring a bond interface.
This patch adds the --may-exist option to the missing
statement, which suppresses the logging of such errors in
syslog.
Additionally, running the unpatched version of this script when
the bond interface already exists appears to break
networking with some versions of ifupdown found in debian
testing (0.8.35), where the service won't start up properly
because of the aforementioned errors.
Signed-off-by: George Diamantopoulos <georgediam@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ted Elhourani [Fri, 25 Jan 2019 19:10:01 +0000 (19:10 +0000)]
python: Monitor Database table to manage lifecycle of IDL client.
The Python IDL implementation supports ovsdb cluster connections.
This patch is a follow up to commit 31e434fc98, it adds the option of
connecting to the leader (the default) in the Raft-based cluster. It mimics
the exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.
The _Server database schema is first requested, then a monitor of the
Database table in the _Server Database. Method __check_server_db verifies
the eligibility of the server. If the attempt to obtain a monitor of the
_Server database fails and a cluster id was not provided this implementation
proceeds to request the data monitor. If a cluster id was provided via the
set_cluster_id method then the connection is aborted and a connection to a
different node is instead attempted, until a valid cluster node is found.
Thus, when supplied, cluster id is interpreted as the intention to only
allow connections to a clustered database. If not supplied, connections to
standalone nodes, or nodes that do not have the _Server database are
allowed. change_seqno is not incremented in the case of Database table
updates.
Timothy Redaelli [Fri, 22 Mar 2019 14:02:14 +0000 (15:02 +0100)]
python: Fix package requirements with old setuptools
Commit 00fcc832d598 ("Update Python package requirements") added a
PEP 508 environment marker to install pywin32 on Windows systems.
This requires a new setuptools version (>= 20.5), but (at least)
RHEL/CentOS7 and Debian Jessie are using an older version of
setuptools and so python extension failed to build.
This commit adds "extras_require" instead of the PEP 508 environment
markers in order to have the conditional dependency of pywin32, but by
remaining compatible with the old setuptools versions.
CC: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> CC: Lucian Petrut <lpetrut@cloudbasesolutions.com> Fixes: 00fcc832d598 ("Update Python package requirements") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Roni Bar Yanai [Tue, 5 Mar 2019 16:49:31 +0000 (16:49 +0000)]
netdev-dpdk: Move offloading code to a new file
Hardware offloading code is moved to a new file called
netdev-rte-offloads.c. The original offloading code is copied
from the netdev-dpdk.c file to the new file, where future
offloading code should be added as well.
The copied code was refactored based on coding style.
The netdev-dpdk.c file will remain unchanged as new offloading
code is added.
Before offloading code was added to the netdev-dpdk.c file (MARK and
RSS actions) the only DPDK RTE calls in use were rte_flow_create() and
rte_flow_destroy(). In preparation for splitting the offloading code
from the netdev-dpdk.c file to a separate file, it is required
to embed these RTE calls into a global netdev-dpdk-* API so that
they can be called from the new file. An example for this requirement
can be seen in the handling of dev->mutex, which should be encapsulated
inside netdev-dpdk class (netdev-dpdk.c file), and should be unknown
to the outside callers. This commit embeds the rte_flow_create() call
inside the netdev_dpdk_flow_create() API and the rte_flow_destroy()
call inside the netdev_dpdk_rte_flow_destroy() API.
Ilya Maximets [Mon, 18 Mar 2019 13:01:13 +0000 (16:01 +0300)]
dpif-netdev-perf: Fix double update of perf histograms.
Real values of 'packets per batch' and 'cycles per upcall' already
added to histograms in 'dpif-netdev' on receive. Adding the averages
makes statistics wrong. We should not add to histograms values that
never really appeared.
For exmaple, in current code following situation is possible:
Ilya Maximets [Thu, 14 Mar 2019 14:43:48 +0000 (17:43 +0300)]
dpdk: Stop dumping memzones to stdout.
Information about memzones reserved on init is not much useful.
Anyway, we need to log it in more civilized manner, i.e. through
the OVS logging subsystem.
Ilya Maximets [Mon, 18 Mar 2019 11:02:30 +0000 (14:02 +0300)]
dpctl: Drop parser debug information.
This information is not that useful.
Anyway, no need to print it each time to the logs.
CC: Ben Pfaff <blp@ovn.org> Fixes: d1fd1ea91242 ("ovs-dpctl: New --names option to use port names in flow dumps.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Or Gerlitz [Sun, 17 Mar 2019 14:13:25 +0000 (16:13 +0200)]
netdev-tc-offloads: Properly get the block id on flow del/get
Currnetly, when a tc flow is installed on a bond port using shared blocks,
we get these failures from the validator threads:
2019-03-17T10:02:58.919Z|13369|dpif(revalidator93)|WARN|system@ovs-system: failed to flow_del \
(No such file or directory) ufid:ebe2888b-9886-4835-a42e-c2911f6af6e8 skb_priority(0),skb_mark(0),in_port(2), \
packet_type(ns=0,id=0),eth(src=e4:11:22:33:44:71,dst=24:8a:07:88:28:12),eth_type(0x0806), [..]
The block id must be retrieved from the device we got by ufid lookup and
not from the input to the related function, fix that for flow del and get.
While here, add the block id to existing debug print.
Fixes: 88dcf2aa8234 ('netdev-provider: add class op to get block_id') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Moshe Levi [Thu, 28 Feb 2019 19:29:10 +0000 (21:29 +0200)]
netdev-tc-offloads: Improve log message for icmpv6 offload not supported
Signed-off-by: Moshe Levi <moshele@mellanox.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Ashish Varma [Wed, 13 Mar 2019 18:31:05 +0000 (11:31 -0700)]
ofp-protocol: Changed the number of bits in OFPUTIL_P_ANY from 10 to 9.
The removal of support for OpenFlow 1.6 (draft) resulted in the removal of
"OFPUTIL_P_OF16_OXM 1 << 9". OFPUTIL_P_ANY which represets all protocols will
now have only 9 valid bits.
Fixes: 29718ad49d61 ("Remove support for OpenFlow 1.6 (draft).") Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Fri, 15 Mar 2019 22:01:20 +0000 (15:01 -0700)]
conntrack: Replace structure copy by memcpy().
There are a few cases where structure copy can be replaced by
memcpy(), for possible portability benefit. This is because
the structures involved have padding and elements of the
structure are used to generate hashes.
Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Fri, 15 Mar 2019 22:01:19 +0000 (15:01 -0700)]
conntrack: Lookup only 'UNNAT conns' in 'nat_clean()'.
When freeing 'UNNAT conns', lookup only 'UNNAT conns' to
protect against possible address overlap with 'default
conns' during a DOS attempt. This is very unlikely, but
protection is simple.
Darrell Ball [Fri, 15 Mar 2019 22:01:18 +0000 (15:01 -0700)]
conntrack: Fix race for NAT cleanup.
Reference lists are not fully protected during cleanup of
NAT connections where the bucket lock is transiently not held during
list traversal. This can lead to referencing freed memory during
cleaning from multiple contexts. Fix this by protecting with
the existing 'cleanup' mutex in the missed cases where 'conn_clean()'
is called. 'conntrack_flush()' is converted to expiry list traversal
to support the proper bucket level protection with the 'cleanup' mutex.
The NAT exhaustion case cleanup in 'conn_not_found()' is also modified
to avoid the same issue.
Ilya Maximets [Mon, 4 Mar 2019 10:35:30 +0000 (13:35 +0300)]
treewide: Clean up inclusions of netdev-dpdk header.
'netdev-dpdk.h' provides only 'netdev_dpdk_register' and
'free_dpdk_buf' which are not used in these files and should
not be used.
Leftovers from the already removed code.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Justin Pettit [Mon, 4 Mar 2019 22:28:58 +0000 (14:28 -0800)]
ovn-nbctl: Don't segfault when ovn-northd doesn't configure dynamic addresses.
When ovn-nbctl is used to configure a logical switch port's addresses, it
does a sanity-check to make sure that a duplicate address isn't being
used. If a port is configured as "dynamic", ovn-northd is supposed to
populate the "dynamic_addresses" column in the Logical_Switch_Port
table. If it isn't ovn-nbctl, would dereference a null pointer as part
of the duplicate address check. This patch checks that "dynamic_addresses"
is actually set first.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 26 Feb 2019 10:38:39 +0000 (13:38 +0300)]
dp-packet: Copy flow mark on packet clone.
Dummy interfaces clones dp-packet while 'receive' appctl processing.
In general, we should do this anyway to avoid any possible issues in
the future with real interfaces.
Sairam Venugopal [Wed, 27 Feb 2019 00:45:10 +0000 (16:45 -0800)]
datapath-windows: Fix potential deadlock in event subscription
Move the EventQueue lock acquisition after the dispatchLock to prevent a
potential deadlock in port creation pipeline. There could be a case where
a port event could try to take up the Dispatch Lock before the Event Queue
lock and the subscription queue event could take up the event queue lock
before the dispatch lock.
Found while testing with Driver Verifier enabled.
Signed-off-by: Sairam Venugopal <vsairam@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
datapath-windows: Fix nbl cleanup when memory allocation fails
StartNblIngressError should be called only when an NBL hasn't been
modified. In this case the nbl context was initialized. Rely on existing
packet completion mechanism to cleanup the NBL.
Found while testing with DriverVerifier with limited memory setting
enabled.
Ilya Maximets [Tue, 26 Feb 2019 10:38:37 +0000 (13:38 +0300)]
dp-packet: Refactor offloading API.
1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
decapsulation or sending to 'ring' ports.
In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.
Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.
Roi Dayan [Mon, 11 Mar 2019 12:47:08 +0000 (14:47 +0200)]
netdev-linux: Remove ingress qdisc before trying to add shared block
Adding shared ingress block with ingress qdisc already exists results
in a failure. So remove the ingress qdisc first.
Also while at it log the slave name.
Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Roi Dayan [Mon, 11 Mar 2019 14:34:05 +0000 (16:34 +0200)]
netdev-tc-offloads: Remove ingress qdisc on tc init flow api
It could be a port added to ovs bridge already has ingress qdisc
which will make the block probe fail.
The probes should start clean and ingress is being added later
so just remove ingress in case it exists.
Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Han Zhou [Wed, 6 Mar 2019 02:16:50 +0000 (18:16 -0800)]
ovsdb-idl: Fix memory leak of ovsdb_idl_db_clear.
ovsdb_idl_row_destroy() doesn't free the memory of row structure itself.
This is because of the ovsdb change tracking feature: the deleted row
may be accessed in the current iteration of main loop. The function
ovsdb_idl_row_destroy_postprocess() is called at the end of
ovsdb_idl_run() to free the deleted rows that are not tracked; the
function ovsdb_idl_db_track_clear() is called (indirectly) by user
at the end of each main loop iteration to free the deleted rows that
are tracked. However, in ovsdb_idl_db_clear(), which may be called when
a session is reset, or when the idl is destroyed, it didn't call
ovsdb_idl_row_destroy_postprocess(), which would result in all the
untracked rows leaked. This patch fixes that.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Fri, 1 Mar 2019 18:56:37 +0000 (10:56 -0800)]
ovsdb raft: Precheck prereq before proposing commit.
In current OVSDB Raft design, when there are multiple transactions
pending, either from same server node or different nodes in the
cluster, only the first one can be successful at once, and following
ones will fail at the prerequisite check on leader node, because
the first one will update the expected prerequisite eid on leader
node, and the prerequisite used for proposing a commit has to be
committed eid, so it is not possible for a node to use the latest
prerequisite expected by the leader to propose a commit until the
lastest transaction is committed by the leader and updated the
committed_index on the node.
Current implementation proposes the commit as soon as the transaction
is requested by the client, which results in continously retry which
causes high CPU load and waste.
Particularly, even if all clients are using leader_only to connect to
only the leader, the prereq check failure still happens a lot when
a batch of transactions are pending on the leader node - the leader
node proposes a batch of commits using the same committed eid as
prerequisite and it updates the expected prereq as soon as the first
one is in progress, but it needs time to append to followers and wait
until majority replies to update the committed_index, which results in
continously useless retries of the following transactions proposed by
the leader itself.
This patch doesn't change the design but simplely pre-checks if current
eid is same as prereq, before proposing the commit, to avoid waste of
CPU cycles, for both leader and followers. When clients use leader_only
mode, this patch completely eliminates the prereq check failures.
In scale test of OVN with 1k HVs and creating and binding 10k lports,
the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost
reduction on followers. (The test was with leader election base time
set to 10000ms, because otherwise the test couldn't complete because
of the frequent leader re-election.)
This is just one of the related performance problems of the prereq
checking mechanism dicussed at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: Add support for DHCP option 150 - TFTP server address
OpenStack Ironic relies on a few DHCP options [0] that were not
supported in OVN yet. This patch is adding the last one which is the
option 150 (TFTP server address, RFC5859 [1]).
Note that this option is Cisco proprietary, the IEEE standard that
matches with this requirement is Option 66. The difference is that 150
allows to multiple IPs to be specified and 66 only allows one.
Han Zhou [Wed, 6 Mar 2019 17:01:21 +0000 (09:01 -0800)]
ovsdb-idl: Fix memory leak of idl->remote.
Reported by Address Sanitizer.
Fixes: 5e07b8f93f03 ("ovsdb-idl: New function ovsdb_idl_create_unconnected().") Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
"Container-based infrastructure is currently being deprecated.
Please remove any sudo: false keys in your .travis.yml file to use
the default fully-virtualized Linux infrastructure instead."
Mark Michelson [Wed, 6 Mar 2019 14:33:04 +0000 (09:33 -0500)]
OVN: Add port addresses to IPAM after all ports are joined.
Joining ports involves setting the peer field on ovn_ports. If a switch
port is visited, and it is connected to a router port, then the switch
port's peer is set to the router port and the router port's peer is set
to the switch port.
A router port's addresses are added to IPAM if it is peered with a
switch that has dynamic addressing enabled.
When visiting ports, if a router port is visited before its connected
switch port, then the router port's peer is not set yet. Therefore the
router's port addresses cannot be added to IPAM. The result is that
duplicate addresses can be assigned by a logical switch.
The fix for this is to wait until all ports have been joined and then
add port addresses to IPAM. This way, we guarantee that all peer
assignments have been set, and no duplicate IP addresses may be assigned
by a switch.
Reported-by: James Page <james.page@canonical.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 5 Mar 2019 23:27:01 +0000 (15:27 -0800)]
dpif-netlink: Free leaked ofpbuf by using ofpbuf_delete
Found by valgrind.
256 bytes in 4 blocks are definitely lost in loss record 319 of 348
by 0x52E204: xmalloc (util.c:123)
by 0x4F6172: ofpbuf_new (ofpbuf.c:151)
by 0x53DEF2: dpif_netlink_ct_get_limits (dpif-netlink.c:2951)
by 0x587881: dpctl_ct_get_limits (dpctl.c:1904)
by 0x58566F: dpctl_unixctl_handler (dpctl.c:2589)
by 0x52D660: process_command (unixctl.c:308)
by 0x52D660: run_connection (unixctl.c:342)
by 0x52D660: unixctl_server_run (unixctl.c:393)
by 0x407366: main (ovs-vswitchd.c:126)
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Select a random IPAM mac_prefix if it has not been provided by the user.
With this patch the admin can avoid to configure mac_prefix in order to
avoid L2 address collisions if multiple OVN deployments share the same
broadcast domain.
Remove MAC_ADDR_PREFIX definitions/occurrences since now mac_prefix is
always provided to ovn-northd
Acked-by: Numan Siddique <nusiddiq@redhat.com> Tested-by: Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ofproto: Fix for ovs-vswitchd crash on flow-mod with unsupported action
Problem Description:
The ovs-vswitchd is crashing while invoking flow-mod with upsupported
action(Tested with ovs2.10.1)
Steps to recreate:
Step 1) Create a flow
ovs-ofctl add-flow switch1
priority=228,dl_type=0x0800,dl_vlan="600",in_port=25,actions=output:ALL
This step is successful.
In the above example, the ofproto provider I have, will return error for
rule_construct as set_fields come after Output.
However the OVS is ignoring the error (The return value of add_flow_init
is ignored in modify_flow_init_strict) and eventually the ovs-vswitched
crashes.
Crash backtrace:
-----------------------
Thread 1 "ovs-vswitchd" received signal SIGSEGV, Segmentation fault.
0x00007f6a06e785fb in modify_flows_start__ (
ofproto=ofproto@entry=0x55b289cecc28, ofm=ofm@entry=0x7ffdf7d57b70)
at ofproto/ofproto.c:5402
5402 in ofproto/ofproto.c
(gdb) bt
#0 0x00007f6a06e785fb in modify_flows_start__ (
ofproto=ofproto@entry=0x55b289cecc28, ofm=ofm@entry=0x7ffdf7d57b70)
at ofproto/ofproto.c:5402
#1 0x00007f6a06e790db in modify_flows_start_loose (ofm=0x7ffdf7d57b70,
ofproto=0x55b289cecc28) at ofproto/ofproto.c:5443
#2 ofproto_flow_mod_start (ofproto=ofproto@entry=0x55b289cecc28,
ofm=ofm@entry=0x7ffdf7d57b70) at ofproto/ofproto.c:7672
#3 0x00007f6a06e79164 in handle_flow_mod__ (
ofproto=ofproto@entry=0x55b289cecc28, fm=fm@entry=0x7ffdf7d57d20,
req=req@entry=0x7ffdf7d57cd0) at ofproto/ofproto.c:5858
#4 0x00007f6a06e792c2 in handle_flow_mod (ofconn=ofconn@entry
=0x55b289d528c0,
oh=oh@entry=0x55b289d5a410) at ofproto/ofproto.c:5835
#5 0x00007f6a06e7a173 in handle_openflow__ (msg=0x55b289d351d0,
ofconn=0x55b289d528c0) at ofproto/ofproto.c:8127
#6 handle_openflow (ofconn=0x55b289d528c0, ofp_msg=0x55b289d351d0)
at ofproto/ofproto.c:8296
#7 0x00007f6a06e6a013 in ofconn_run (
handle_openflow=0x7f6a06e796f0 <handle_openflow>,
ofconn=0x55b289d528c0)
at ofproto/connmgr.c:1446
#8 connmgr_run (mgr=0x55b289d14fe0,
handle_openflow=handle_openflow@entry=0x7f6a06e796f0
handle_openflow>)
at ofproto/connmgr.c:365
With this fix, OVS does not crash.
Signed-off-by: Parameswaran Krishnamurthy <parkrish@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: update RA next_announce according to {min, max}_interval
Update RA next_announce whenever min_interval and/or max_interval are
updated in sbrec_port_binding option. In the current implementation
if ipv6_ra_configs:send_periodic is set to true before setting
ipv6_ra_configs:{min,max}_interval, next_announce will be set using
default values and it will not be updated until we send the first IPv6
router advertisement
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
lib/tc: add ingress ratelimiting support for tc-offload
Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.
An example of what to expect:
OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100
Ilya Maximets [Fri, 1 Mar 2019 11:59:33 +0000 (14:59 +0300)]
dpdk: Fix case-sensitivity of dpdk-init knob.
Before supporting the DPDK initialization status in DB 'dpdk-init' was
just a boolean and 'smap_get_bool', which is case-insensitive, was used
to get the value.
Current code uses simple 'strcmp' that fails to recognize values like
"True". As a result this breaks different OVS configuration tools.
For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS
is not able to recognize it leading to broken installations.
'strcasecmp' should be used instead to fix the issue.
CC: Aaron Conole <aconole@redhat.com> Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The rconn connection timer measures time on the granularity of seconds,
which means that sometimes the actual timeout can be just a millsecond or
so, which led to occasional immediate connection failures from ovs-ofctl.
VMware-BZ: #2295760 Fixes: 476d2551abd2 ("rconn: Introduce new invariant to fix assertion failure in corner case.") Reported-by: Ken Ajiro <ken-ajiro@xr.jp.nec.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Timothy Redaelli [Thu, 28 Feb 2019 17:27:46 +0000 (18:27 +0100)]
rhel: Use PIDFile on forking systemd service files
Currently, PIDFile is not used in systemd service files with
Type=forking. This means sometimes systemd fails to restart a daemon
that is killed (with SIGKILL) or that is crashed.
This commit adds PIDFile to all systemd service file with Type=forking
in order to always have the correct PID to monitor.
Flavio Leitner [Thu, 28 Feb 2019 16:13:57 +0000 (13:13 -0300)]
rhel: limit stack size to 2M.
The default stack size in Fedora/RHEL is 8M, which means when ovs-vswitchd
daemon starts and uses --mlockall (default), it will dirty all memory
regions for all threads which is proportionally to the number of CPUs.
On a big host this increases memory usage to many hundreds of megabytes
while OVS actually requires much less.
This patch relies on systemd to limit to 2M/thread. That is much more
than the minimum documented at function ovs_thread_create():
/* Some small systems use a default stack size as small as 80 kB, but OVS
* requires approximately 384 kB according to the following analysis:
* https://mail.openvswitch.org/pipermail/ovs-dev/2016-January/308592.html
*
* We use 512 kB to give us some margin of error. */
Han Zhou [Thu, 28 Feb 2019 17:15:20 +0000 (09:15 -0800)]
ovsdb-idl: Fast resync from server when connection reset.
Use monitor_cond_since to request changes after last version of local
data when connection to server is reset, without clearing the local
data. It falls back to clearing and repopulating all the data when
the requested id cannot be fulfilled by the server.
Test result at ovn-scale-test environment using clustered mode:
- 1K HVs (ovsdb clients)
- 10K lports
Without the patch it took 30+ min for the SB ovsdb-server to calm down
and HVs to stablize the connectin and finish syncing data.
With the patch there were no noticible CPU spike of SB ovsdb-server,
and all HVs were in sync with SB within 1 min, which is the probe
interval set in this test (so it took at most 1 min for HVs to notice
the TCP connection reset and reconnect and resync finished immediately
after that).
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047457.html Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Thu, 28 Feb 2019 17:15:18 +0000 (09:15 -0800)]
ovsdb-monitor: Support monitor_cond_since.
Support the new monitor method monitor_cond_since so that a client
can request monitoring start from a specific point instead of always
from beginning. This will reduce the cost at scenarios when server
is restarted/failed-over but client still has all existing data. In
these scenarios only new changes (and in most cases no change) needed
to be transfered to client. When ovsdb-server restarted, history
transactions are read from disk file; when ovsdb-server failed over,
history transactions exists already in the memory of the new server.
There are situations that the requested transaction may not be found.
For example, a transaction that is too old and has been discarded
from the maintained history list in memory, or the transactions on
disk has been compacted during ovsdb compaction. In those situations
the server fall backs to transfer all data start from begining.
For more details of the protocol change, see
Documentation/ref/ovsdb-server.7.rst.
This change includes both server side and ovsdb-client side changes
with the new protocol. IDLs using this capability will be added in
future patches.
Now the feature takes effect only for cluster mode of ovsdb-server,
because cluster mode is the only mode that supports unique transcation
uuid today. For other modes, the monitor_cond_since always fall back
to transfer all data with found = false. Support for those modes can
be added in the future.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Thu, 28 Feb 2019 17:15:17 +0000 (09:15 -0800)]
ovsdb-server: Transaction history tracking.
Maintaining last N (n = 100) transactions in memory, which will be
used for future patches for generating monitor data from any point
in this N transactions.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Current ovsdb monitor maintains its own transaction version through an
incremental integer and use it to identify changes starting from different
version, and also use it to figure out if each set of changes should be
flushed. In particular, it uses number 0 to represent that the change set
contains all data for initial client population. It is a smart way but it
prevents further extension of the monitoring mechanism to support future use
case for clients to request changes starting from a given history point. This
patch refactors the structures so that change sets are referenced directly
through the pointer. It uses additional members such as init_change_set,
new_change_set to indicates the specific change set explicitely, instead of
through calculated version numbers based on implicite rules.
At the same time, this patch provides better encapsulation for change set
(composed of data in a list of tables), while still allowing traversing
across change sets for a given table.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 27 Feb 2019 22:21:00 +0000 (14:21 -0800)]
oss-fuzz: Fix oss build errors because of ovs API change
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13432 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 26 Feb 2019 10:38:35 +0000 (13:38 +0300)]
dpif-netdev: Reduce log level for not found mark id.
It's a normal case for 'find' function, especially because this
happens for every first packet of flow that was not offloaded yet.
Should not warn about this. Dropped to DBG to avoid log trashing in
case of big number of new flows.
CC: Yuanhan Liu <yliu@fridaylinux.org> Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id") Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ilya Maximets [Wed, 6 Feb 2019 15:40:36 +0000 (18:40 +0300)]
netdev-dpdk: Use single struct/union for flow offload items.
Having a single structure allows to simplify the code path and
clear all the items at once (probably faster). This does not
increase stack memory usage because all the L4 related items
grouped in a union.
Changes:
- Memsets combined.
- 'ipv4_next_proto_mask' dropped as we already know the address
and able to use 'mask.ipv4.hdr.next_proto_id' directly.
- Group of 'if' statements for L4 protocols turned to a 'switch'.
We can do that, because we don't have semi-local variables anymore.
- Eliminated 'end_proto_check' label. Not needed with 'switch'.
Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
sense to use 'rte_memcpy' for 6 bytes.
Yanqin Wei [Wed, 27 Feb 2019 09:44:06 +0000 (17:44 +0800)]
hash: Enable hash_bytes128 optimization for aarch64.
"hash_bytes128" has two versions for 64 bits and 32 bits system. This
should be common optimization for their respective platforms. But 64 bits
version was only enabled in x86-64. This patch enable it for aarch64
platform.
Micro benchmarking test was run in two kinds of arm platform. It was
observed that 50% performance improvement in thunderX2 and 40% improvement
in TaiShan(Cortex-A72).
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Mon, 25 Feb 2019 23:36:32 +0000 (15:36 -0800)]
conntrack: Skip ephemeral ports with specified port range.
This patch removes the fallback to ephemeral ports when a SNAT port
range is specified; DNAT already does not fallback to ephemeral ports,
in general. This is not restrictive to the user and makes it easier to
limit NAT L4 port selection.
The documentation is updated and a new test is added to enforce the
behavior.
Darrell Ball [Mon, 25 Feb 2019 23:36:31 +0000 (15:36 -0800)]
conntrack: Fix wasted work for ICMP NAT.
ICMPv4 and ICMPv6 are not subject to port address translation (PAT),
however, a loop increments a local variable unnecessarily for
ephemeral ports, resulting in wasted work for ICMPv4 and ICMPv6 packets
subject to NAT. Fix this by checking for PAT being enabled before
incrementing the local port variable and bail out otherwise.
Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 15 Dec 2018 02:16:55 +0000 (18:16 -0800)]
odp-util: Improve log messages and error reporting for Netlink parsing.
As a side effect, this also reduces a lot of log messages' severities from
ERR to WARN. They just didn't seem like messages that in general reported
anything that would prevent functioning.
Ilya Maximets [Mon, 25 Feb 2019 17:43:36 +0000 (20:43 +0300)]
vlog: Better handle syslog handler exceptions.
'set_levels_from_string' doesn't check for exceptions that could
happen while opening syslog files or connecting to syslog sockets.
For example, if rsyslog stopped on a system:
$ test-unixctl.py -vFACILITY:daemon --detach
Traceback (most recent call last):
File "../../../../tests/test-unixctl.py", line 90, in <module>
main()
File "../../../../tests/test-unixctl.py", line 61, in main
ovs.vlog.handle_args(args)
File "python/ovs/vlog.py", line 463, in handle_args
msg = Vlog.set_levels_from_string(verbose)
File "python/ovs/vlog.py", line 345, in set_levels_from_string
Vlog.add_syslog_handler(words[1])
File "python/ovs/vlog.py", line 321, in add_syslog_handler
facility=syslog_facility)
File "/python2.7/logging/handlers.py", line 759, in __init__
self._connect_unixsocket(address)
File "/python2.7/logging/handlers.py", line 787, in _connect_unixsocket
self.socket.connect(address)
File "/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused
In this case "/dev/log" file exists, so the check inside
'add_syslog_handler' doesn't help.
We need to catch the exceptions in 'set_levels_from_string' same way
as it done in 'init' function.
Also, we don't really need to check for '/dev/log' existence, because
exception will be catched on the upper layer and properly handled by
disabling the corresponding logger.
Fixes: d69d61c7c175 ("vlog: Ability to override the default log facility.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Mon, 18 Feb 2019 04:42:22 +0000 (10:12 +0530)]
ovn-controller: Provide the option to set the datapath-type of br-int
If the integration bridge is deleted, ovn-controller recreates it
but the previous datapath-type value is lost if it was set. This
patch adds the code in ovn-controller to set the datapath-type
if it is configured by the user in the 'external_ids:ovn-bridge-datapath-type'
column of OpenvSwitch table.
Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Matthias May [Thu, 14 Feb 2019 23:16:14 +0000 (00:16 +0100)]
rstp: add ability to receive VLAN-tagged BPDUs
There are switches which allow to transmit their BPDUs VLAN-tagged.
With this change OVS is able to receive VLAN-tagged BPDUs, but still
transmits its own BPDUs untagged.
This was tested against Westermo RFI-207-F4G-T3G.
Signed-off-by: Matthias May <matthias.may@neratec.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Fri, 15 Feb 2019 20:25:58 +0000 (12:25 -0800)]
ovsdb_monitor: Fix style of prototypes.
Ommiting the parameter names in prototypes, as suggested by coding
style: Omit parameter names from function prototypes when the names
do not give useful information.
Adjust orders of parameters as suggested by coding style.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>