Ashish Varma [Wed, 13 Mar 2019 18:31:05 +0000 (11:31 -0700)]
ofp-protocol: Changed the number of bits in OFPUTIL_P_ANY from 10 to 9.
The removal of support for OpenFlow 1.6 (draft) resulted in the removal of
"OFPUTIL_P_OF16_OXM 1 << 9". OFPUTIL_P_ANY which represets all protocols will
now have only 9 valid bits.
Fixes: 29718ad49d61 ("Remove support for OpenFlow 1.6 (draft).") Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Fri, 15 Mar 2019 22:01:20 +0000 (15:01 -0700)]
conntrack: Replace structure copy by memcpy().
There are a few cases where structure copy can be replaced by
memcpy(), for possible portability benefit. This is because
the structures involved have padding and elements of the
structure are used to generate hashes.
Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Fri, 15 Mar 2019 22:01:19 +0000 (15:01 -0700)]
conntrack: Lookup only 'UNNAT conns' in 'nat_clean()'.
When freeing 'UNNAT conns', lookup only 'UNNAT conns' to
protect against possible address overlap with 'default
conns' during a DOS attempt. This is very unlikely, but
protection is simple.
Darrell Ball [Fri, 15 Mar 2019 22:01:18 +0000 (15:01 -0700)]
conntrack: Fix race for NAT cleanup.
Reference lists are not fully protected during cleanup of
NAT connections where the bucket lock is transiently not held during
list traversal. This can lead to referencing freed memory during
cleaning from multiple contexts. Fix this by protecting with
the existing 'cleanup' mutex in the missed cases where 'conn_clean()'
is called. 'conntrack_flush()' is converted to expiry list traversal
to support the proper bucket level protection with the 'cleanup' mutex.
The NAT exhaustion case cleanup in 'conn_not_found()' is also modified
to avoid the same issue.
Ilya Maximets [Mon, 4 Mar 2019 10:35:30 +0000 (13:35 +0300)]
treewide: Clean up inclusions of netdev-dpdk header.
'netdev-dpdk.h' provides only 'netdev_dpdk_register' and
'free_dpdk_buf' which are not used in these files and should
not be used.
Leftovers from the already removed code.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Justin Pettit [Mon, 4 Mar 2019 22:28:58 +0000 (14:28 -0800)]
ovn-nbctl: Don't segfault when ovn-northd doesn't configure dynamic addresses.
When ovn-nbctl is used to configure a logical switch port's addresses, it
does a sanity-check to make sure that a duplicate address isn't being
used. If a port is configured as "dynamic", ovn-northd is supposed to
populate the "dynamic_addresses" column in the Logical_Switch_Port
table. If it isn't ovn-nbctl, would dereference a null pointer as part
of the duplicate address check. This patch checks that "dynamic_addresses"
is actually set first.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 26 Feb 2019 10:38:39 +0000 (13:38 +0300)]
dp-packet: Copy flow mark on packet clone.
Dummy interfaces clones dp-packet while 'receive' appctl processing.
In general, we should do this anyway to avoid any possible issues in
the future with real interfaces.
Sairam Venugopal [Wed, 27 Feb 2019 00:45:10 +0000 (16:45 -0800)]
datapath-windows: Fix potential deadlock in event subscription
Move the EventQueue lock acquisition after the dispatchLock to prevent a
potential deadlock in port creation pipeline. There could be a case where
a port event could try to take up the Dispatch Lock before the Event Queue
lock and the subscription queue event could take up the event queue lock
before the dispatch lock.
Found while testing with Driver Verifier enabled.
Signed-off-by: Sairam Venugopal <vsairam@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
datapath-windows: Fix nbl cleanup when memory allocation fails
StartNblIngressError should be called only when an NBL hasn't been
modified. In this case the nbl context was initialized. Rely on existing
packet completion mechanism to cleanup the NBL.
Found while testing with DriverVerifier with limited memory setting
enabled.
Ilya Maximets [Tue, 26 Feb 2019 10:38:37 +0000 (13:38 +0300)]
dp-packet: Refactor offloading API.
1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
decapsulation or sending to 'ring' ports.
In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.
Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.
Roi Dayan [Mon, 11 Mar 2019 12:47:08 +0000 (14:47 +0200)]
netdev-linux: Remove ingress qdisc before trying to add shared block
Adding shared ingress block with ingress qdisc already exists results
in a failure. So remove the ingress qdisc first.
Also while at it log the slave name.
Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Roi Dayan [Mon, 11 Mar 2019 14:34:05 +0000 (16:34 +0200)]
netdev-tc-offloads: Remove ingress qdisc on tc init flow api
It could be a port added to ovs bridge already has ingress qdisc
which will make the block probe fail.
The probes should start clean and ingress is being added later
so just remove ingress in case it exists.
Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Han Zhou [Wed, 6 Mar 2019 02:16:50 +0000 (18:16 -0800)]
ovsdb-idl: Fix memory leak of ovsdb_idl_db_clear.
ovsdb_idl_row_destroy() doesn't free the memory of row structure itself.
This is because of the ovsdb change tracking feature: the deleted row
may be accessed in the current iteration of main loop. The function
ovsdb_idl_row_destroy_postprocess() is called at the end of
ovsdb_idl_run() to free the deleted rows that are not tracked; the
function ovsdb_idl_db_track_clear() is called (indirectly) by user
at the end of each main loop iteration to free the deleted rows that
are tracked. However, in ovsdb_idl_db_clear(), which may be called when
a session is reset, or when the idl is destroyed, it didn't call
ovsdb_idl_row_destroy_postprocess(), which would result in all the
untracked rows leaked. This patch fixes that.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Fri, 1 Mar 2019 18:56:37 +0000 (10:56 -0800)]
ovsdb raft: Precheck prereq before proposing commit.
In current OVSDB Raft design, when there are multiple transactions
pending, either from same server node or different nodes in the
cluster, only the first one can be successful at once, and following
ones will fail at the prerequisite check on leader node, because
the first one will update the expected prerequisite eid on leader
node, and the prerequisite used for proposing a commit has to be
committed eid, so it is not possible for a node to use the latest
prerequisite expected by the leader to propose a commit until the
lastest transaction is committed by the leader and updated the
committed_index on the node.
Current implementation proposes the commit as soon as the transaction
is requested by the client, which results in continously retry which
causes high CPU load and waste.
Particularly, even if all clients are using leader_only to connect to
only the leader, the prereq check failure still happens a lot when
a batch of transactions are pending on the leader node - the leader
node proposes a batch of commits using the same committed eid as
prerequisite and it updates the expected prereq as soon as the first
one is in progress, but it needs time to append to followers and wait
until majority replies to update the committed_index, which results in
continously useless retries of the following transactions proposed by
the leader itself.
This patch doesn't change the design but simplely pre-checks if current
eid is same as prereq, before proposing the commit, to avoid waste of
CPU cycles, for both leader and followers. When clients use leader_only
mode, this patch completely eliminates the prereq check failures.
In scale test of OVN with 1k HVs and creating and binding 10k lports,
the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost
reduction on followers. (The test was with leader election base time
set to 10000ms, because otherwise the test couldn't complete because
of the frequent leader re-election.)
This is just one of the related performance problems of the prereq
checking mechanism dicussed at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: Add support for DHCP option 150 - TFTP server address
OpenStack Ironic relies on a few DHCP options [0] that were not
supported in OVN yet. This patch is adding the last one which is the
option 150 (TFTP server address, RFC5859 [1]).
Note that this option is Cisco proprietary, the IEEE standard that
matches with this requirement is Option 66. The difference is that 150
allows to multiple IPs to be specified and 66 only allows one.
Han Zhou [Wed, 6 Mar 2019 17:01:21 +0000 (09:01 -0800)]
ovsdb-idl: Fix memory leak of idl->remote.
Reported by Address Sanitizer.
Fixes: 5e07b8f93f03 ("ovsdb-idl: New function ovsdb_idl_create_unconnected().") Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
"Container-based infrastructure is currently being deprecated.
Please remove any sudo: false keys in your .travis.yml file to use
the default fully-virtualized Linux infrastructure instead."
Mark Michelson [Wed, 6 Mar 2019 14:33:04 +0000 (09:33 -0500)]
OVN: Add port addresses to IPAM after all ports are joined.
Joining ports involves setting the peer field on ovn_ports. If a switch
port is visited, and it is connected to a router port, then the switch
port's peer is set to the router port and the router port's peer is set
to the switch port.
A router port's addresses are added to IPAM if it is peered with a
switch that has dynamic addressing enabled.
When visiting ports, if a router port is visited before its connected
switch port, then the router port's peer is not set yet. Therefore the
router's port addresses cannot be added to IPAM. The result is that
duplicate addresses can be assigned by a logical switch.
The fix for this is to wait until all ports have been joined and then
add port addresses to IPAM. This way, we guarantee that all peer
assignments have been set, and no duplicate IP addresses may be assigned
by a switch.
Reported-by: James Page <james.page@canonical.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 5 Mar 2019 23:27:01 +0000 (15:27 -0800)]
dpif-netlink: Free leaked ofpbuf by using ofpbuf_delete
Found by valgrind.
256 bytes in 4 blocks are definitely lost in loss record 319 of 348
by 0x52E204: xmalloc (util.c:123)
by 0x4F6172: ofpbuf_new (ofpbuf.c:151)
by 0x53DEF2: dpif_netlink_ct_get_limits (dpif-netlink.c:2951)
by 0x587881: dpctl_ct_get_limits (dpctl.c:1904)
by 0x58566F: dpctl_unixctl_handler (dpctl.c:2589)
by 0x52D660: process_command (unixctl.c:308)
by 0x52D660: run_connection (unixctl.c:342)
by 0x52D660: unixctl_server_run (unixctl.c:393)
by 0x407366: main (ovs-vswitchd.c:126)
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Select a random IPAM mac_prefix if it has not been provided by the user.
With this patch the admin can avoid to configure mac_prefix in order to
avoid L2 address collisions if multiple OVN deployments share the same
broadcast domain.
Remove MAC_ADDR_PREFIX definitions/occurrences since now mac_prefix is
always provided to ovn-northd
Acked-by: Numan Siddique <nusiddiq@redhat.com> Tested-by: Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
ofproto: Fix for ovs-vswitchd crash on flow-mod with unsupported action
Problem Description:
The ovs-vswitchd is crashing while invoking flow-mod with upsupported
action(Tested with ovs2.10.1)
Steps to recreate:
Step 1) Create a flow
ovs-ofctl add-flow switch1
priority=228,dl_type=0x0800,dl_vlan="600",in_port=25,actions=output:ALL
This step is successful.
In the above example, the ofproto provider I have, will return error for
rule_construct as set_fields come after Output.
However the OVS is ignoring the error (The return value of add_flow_init
is ignored in modify_flow_init_strict) and eventually the ovs-vswitched
crashes.
Crash backtrace:
-----------------------
Thread 1 "ovs-vswitchd" received signal SIGSEGV, Segmentation fault.
0x00007f6a06e785fb in modify_flows_start__ (
ofproto=ofproto@entry=0x55b289cecc28, ofm=ofm@entry=0x7ffdf7d57b70)
at ofproto/ofproto.c:5402
5402 in ofproto/ofproto.c
(gdb) bt
#0 0x00007f6a06e785fb in modify_flows_start__ (
ofproto=ofproto@entry=0x55b289cecc28, ofm=ofm@entry=0x7ffdf7d57b70)
at ofproto/ofproto.c:5402
#1 0x00007f6a06e790db in modify_flows_start_loose (ofm=0x7ffdf7d57b70,
ofproto=0x55b289cecc28) at ofproto/ofproto.c:5443
#2 ofproto_flow_mod_start (ofproto=ofproto@entry=0x55b289cecc28,
ofm=ofm@entry=0x7ffdf7d57b70) at ofproto/ofproto.c:7672
#3 0x00007f6a06e79164 in handle_flow_mod__ (
ofproto=ofproto@entry=0x55b289cecc28, fm=fm@entry=0x7ffdf7d57d20,
req=req@entry=0x7ffdf7d57cd0) at ofproto/ofproto.c:5858
#4 0x00007f6a06e792c2 in handle_flow_mod (ofconn=ofconn@entry
=0x55b289d528c0,
oh=oh@entry=0x55b289d5a410) at ofproto/ofproto.c:5835
#5 0x00007f6a06e7a173 in handle_openflow__ (msg=0x55b289d351d0,
ofconn=0x55b289d528c0) at ofproto/ofproto.c:8127
#6 handle_openflow (ofconn=0x55b289d528c0, ofp_msg=0x55b289d351d0)
at ofproto/ofproto.c:8296
#7 0x00007f6a06e6a013 in ofconn_run (
handle_openflow=0x7f6a06e796f0 <handle_openflow>,
ofconn=0x55b289d528c0)
at ofproto/connmgr.c:1446
#8 connmgr_run (mgr=0x55b289d14fe0,
handle_openflow=handle_openflow@entry=0x7f6a06e796f0
handle_openflow>)
at ofproto/connmgr.c:365
With this fix, OVS does not crash.
Signed-off-by: Parameswaran Krishnamurthy <parkrish@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: update RA next_announce according to {min, max}_interval
Update RA next_announce whenever min_interval and/or max_interval are
updated in sbrec_port_binding option. In the current implementation
if ipv6_ra_configs:send_periodic is set to true before setting
ipv6_ra_configs:{min,max}_interval, next_announce will be set using
default values and it will not be updated until we send the first IPv6
router advertisement
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
lib/tc: add ingress ratelimiting support for tc-offload
Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.
An example of what to expect:
OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100
Ilya Maximets [Fri, 1 Mar 2019 11:59:33 +0000 (14:59 +0300)]
dpdk: Fix case-sensitivity of dpdk-init knob.
Before supporting the DPDK initialization status in DB 'dpdk-init' was
just a boolean and 'smap_get_bool', which is case-insensitive, was used
to get the value.
Current code uses simple 'strcmp' that fails to recognize values like
"True". As a result this breaks different OVS configuration tools.
For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS
is not able to recognize it leading to broken installations.
'strcasecmp' should be used instead to fix the issue.
CC: Aaron Conole <aconole@redhat.com> Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The rconn connection timer measures time on the granularity of seconds,
which means that sometimes the actual timeout can be just a millsecond or
so, which led to occasional immediate connection failures from ovs-ofctl.
VMware-BZ: #2295760 Fixes: 476d2551abd2 ("rconn: Introduce new invariant to fix assertion failure in corner case.") Reported-by: Ken Ajiro <ken-ajiro@xr.jp.nec.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Timothy Redaelli [Thu, 28 Feb 2019 17:27:46 +0000 (18:27 +0100)]
rhel: Use PIDFile on forking systemd service files
Currently, PIDFile is not used in systemd service files with
Type=forking. This means sometimes systemd fails to restart a daemon
that is killed (with SIGKILL) or that is crashed.
This commit adds PIDFile to all systemd service file with Type=forking
in order to always have the correct PID to monitor.
Flavio Leitner [Thu, 28 Feb 2019 16:13:57 +0000 (13:13 -0300)]
rhel: limit stack size to 2M.
The default stack size in Fedora/RHEL is 8M, which means when ovs-vswitchd
daemon starts and uses --mlockall (default), it will dirty all memory
regions for all threads which is proportionally to the number of CPUs.
On a big host this increases memory usage to many hundreds of megabytes
while OVS actually requires much less.
This patch relies on systemd to limit to 2M/thread. That is much more
than the minimum documented at function ovs_thread_create():
/* Some small systems use a default stack size as small as 80 kB, but OVS
* requires approximately 384 kB according to the following analysis:
* https://mail.openvswitch.org/pipermail/ovs-dev/2016-January/308592.html
*
* We use 512 kB to give us some margin of error. */
Han Zhou [Thu, 28 Feb 2019 17:15:20 +0000 (09:15 -0800)]
ovsdb-idl: Fast resync from server when connection reset.
Use monitor_cond_since to request changes after last version of local
data when connection to server is reset, without clearing the local
data. It falls back to clearing and repopulating all the data when
the requested id cannot be fulfilled by the server.
Test result at ovn-scale-test environment using clustered mode:
- 1K HVs (ovsdb clients)
- 10K lports
Without the patch it took 30+ min for the SB ovsdb-server to calm down
and HVs to stablize the connectin and finish syncing data.
With the patch there were no noticible CPU spike of SB ovsdb-server,
and all HVs were in sync with SB within 1 min, which is the probe
interval set in this test (so it took at most 1 min for HVs to notice
the TCP connection reset and reconnect and resync finished immediately
after that).
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047457.html Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Thu, 28 Feb 2019 17:15:18 +0000 (09:15 -0800)]
ovsdb-monitor: Support monitor_cond_since.
Support the new monitor method monitor_cond_since so that a client
can request monitoring start from a specific point instead of always
from beginning. This will reduce the cost at scenarios when server
is restarted/failed-over but client still has all existing data. In
these scenarios only new changes (and in most cases no change) needed
to be transfered to client. When ovsdb-server restarted, history
transactions are read from disk file; when ovsdb-server failed over,
history transactions exists already in the memory of the new server.
There are situations that the requested transaction may not be found.
For example, a transaction that is too old and has been discarded
from the maintained history list in memory, or the transactions on
disk has been compacted during ovsdb compaction. In those situations
the server fall backs to transfer all data start from begining.
For more details of the protocol change, see
Documentation/ref/ovsdb-server.7.rst.
This change includes both server side and ovsdb-client side changes
with the new protocol. IDLs using this capability will be added in
future patches.
Now the feature takes effect only for cluster mode of ovsdb-server,
because cluster mode is the only mode that supports unique transcation
uuid today. For other modes, the monitor_cond_since always fall back
to transfer all data with found = false. Support for those modes can
be added in the future.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Thu, 28 Feb 2019 17:15:17 +0000 (09:15 -0800)]
ovsdb-server: Transaction history tracking.
Maintaining last N (n = 100) transactions in memory, which will be
used for future patches for generating monitor data from any point
in this N transactions.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Current ovsdb monitor maintains its own transaction version through an
incremental integer and use it to identify changes starting from different
version, and also use it to figure out if each set of changes should be
flushed. In particular, it uses number 0 to represent that the change set
contains all data for initial client population. It is a smart way but it
prevents further extension of the monitoring mechanism to support future use
case for clients to request changes starting from a given history point. This
patch refactors the structures so that change sets are referenced directly
through the pointer. It uses additional members such as init_change_set,
new_change_set to indicates the specific change set explicitely, instead of
through calculated version numbers based on implicite rules.
At the same time, this patch provides better encapsulation for change set
(composed of data in a list of tables), while still allowing traversing
across change sets for a given table.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 27 Feb 2019 22:21:00 +0000 (14:21 -0800)]
oss-fuzz: Fix oss build errors because of ovs API change
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13432 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Tue, 26 Feb 2019 10:38:35 +0000 (13:38 +0300)]
dpif-netdev: Reduce log level for not found mark id.
It's a normal case for 'find' function, especially because this
happens for every first packet of flow that was not offloaded yet.
Should not warn about this. Dropped to DBG to avoid log trashing in
case of big number of new flows.
CC: Yuanhan Liu <yliu@fridaylinux.org> Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id") Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ilya Maximets [Wed, 6 Feb 2019 15:40:36 +0000 (18:40 +0300)]
netdev-dpdk: Use single struct/union for flow offload items.
Having a single structure allows to simplify the code path and
clear all the items at once (probably faster). This does not
increase stack memory usage because all the L4 related items
grouped in a union.
Changes:
- Memsets combined.
- 'ipv4_next_proto_mask' dropped as we already know the address
and able to use 'mask.ipv4.hdr.next_proto_id' directly.
- Group of 'if' statements for L4 protocols turned to a 'switch'.
We can do that, because we don't have semi-local variables anymore.
- Eliminated 'end_proto_check' label. Not needed with 'switch'.
Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
sense to use 'rte_memcpy' for 6 bytes.
Yanqin Wei [Wed, 27 Feb 2019 09:44:06 +0000 (17:44 +0800)]
hash: Enable hash_bytes128 optimization for aarch64.
"hash_bytes128" has two versions for 64 bits and 32 bits system. This
should be common optimization for their respective platforms. But 64 bits
version was only enabled in x86-64. This patch enable it for aarch64
platform.
Micro benchmarking test was run in two kinds of arm platform. It was
observed that 50% performance improvement in thunderX2 and 40% improvement
in TaiShan(Cortex-A72).
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Darrell Ball [Mon, 25 Feb 2019 23:36:32 +0000 (15:36 -0800)]
conntrack: Skip ephemeral ports with specified port range.
This patch removes the fallback to ephemeral ports when a SNAT port
range is specified; DNAT already does not fallback to ephemeral ports,
in general. This is not restrictive to the user and makes it easier to
limit NAT L4 port selection.
The documentation is updated and a new test is added to enforce the
behavior.
Darrell Ball [Mon, 25 Feb 2019 23:36:31 +0000 (15:36 -0800)]
conntrack: Fix wasted work for ICMP NAT.
ICMPv4 and ICMPv6 are not subject to port address translation (PAT),
however, a loop increments a local variable unnecessarily for
ephemeral ports, resulting in wasted work for ICMPv4 and ICMPv6 packets
subject to NAT. Fix this by checking for PAT being enabled before
incrementing the local port variable and bail out otherwise.
Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 15 Dec 2018 02:16:55 +0000 (18:16 -0800)]
odp-util: Improve log messages and error reporting for Netlink parsing.
As a side effect, this also reduces a lot of log messages' severities from
ERR to WARN. They just didn't seem like messages that in general reported
anything that would prevent functioning.
Ilya Maximets [Mon, 25 Feb 2019 17:43:36 +0000 (20:43 +0300)]
vlog: Better handle syslog handler exceptions.
'set_levels_from_string' doesn't check for exceptions that could
happen while opening syslog files or connecting to syslog sockets.
For example, if rsyslog stopped on a system:
$ test-unixctl.py -vFACILITY:daemon --detach
Traceback (most recent call last):
File "../../../../tests/test-unixctl.py", line 90, in <module>
main()
File "../../../../tests/test-unixctl.py", line 61, in main
ovs.vlog.handle_args(args)
File "python/ovs/vlog.py", line 463, in handle_args
msg = Vlog.set_levels_from_string(verbose)
File "python/ovs/vlog.py", line 345, in set_levels_from_string
Vlog.add_syslog_handler(words[1])
File "python/ovs/vlog.py", line 321, in add_syslog_handler
facility=syslog_facility)
File "/python2.7/logging/handlers.py", line 759, in __init__
self._connect_unixsocket(address)
File "/python2.7/logging/handlers.py", line 787, in _connect_unixsocket
self.socket.connect(address)
File "/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused
In this case "/dev/log" file exists, so the check inside
'add_syslog_handler' doesn't help.
We need to catch the exceptions in 'set_levels_from_string' same way
as it done in 'init' function.
Also, we don't really need to check for '/dev/log' existence, because
exception will be catched on the upper layer and properly handled by
disabling the corresponding logger.
Fixes: d69d61c7c175 ("vlog: Ability to override the default log facility.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Mon, 18 Feb 2019 04:42:22 +0000 (10:12 +0530)]
ovn-controller: Provide the option to set the datapath-type of br-int
If the integration bridge is deleted, ovn-controller recreates it
but the previous datapath-type value is lost if it was set. This
patch adds the code in ovn-controller to set the datapath-type
if it is configured by the user in the 'external_ids:ovn-bridge-datapath-type'
column of OpenvSwitch table.
Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Matthias May [Thu, 14 Feb 2019 23:16:14 +0000 (00:16 +0100)]
rstp: add ability to receive VLAN-tagged BPDUs
There are switches which allow to transmit their BPDUs VLAN-tagged.
With this change OVS is able to receive VLAN-tagged BPDUs, but still
transmits its own BPDUs untagged.
This was tested against Westermo RFI-207-F4G-T3G.
Signed-off-by: Matthias May <matthias.may@neratec.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Fri, 15 Feb 2019 20:25:58 +0000 (12:25 -0800)]
ovsdb_monitor: Fix style of prototypes.
Ommiting the parameter names in prototypes, as suggested by coding
style: Omit parameter names from function prototypes when the names
do not give useful information.
Adjust orders of parameters as suggested by coding style.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Sat, 16 Feb 2019 02:49:52 +0000 (18:49 -0800)]
ovn-nbctl: Daemon mode should retry when IDL connection lost.
When creating IDL, "retry" was set to false. However, in daemon
mode, reconnecting upon DB server failure should be transparent
to user. This even impacts HA mode. E.g. in clustered mode, although
IDL tries to connect to next server, but at the first retry the
server fail-over may not be completed yet, and it stops retry after
N (N = number of remotes) times.
This patch makes sure in daemon mode retry is set to true so that
the daemon will automatically retry forever.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
venu iyer [Tue, 15 Jan 2019 01:30:43 +0000 (17:30 -0800)]
Support for multiple VTEP in OVN
OVN uses tunnels to achieve logical network connectivity. The tunnel IP
to be used when communicating with a node is configured using an external_ids
field called "ovn-encap-ip" (and "ovn-encap-type" to indicate the type of
tunnel - geneve, vxlan, stt).
The fact that "ovn-encap-ip" is a single IP is significantly limiting when
used in certain scenarios. Primarly, if we have multiple NICs on a system and
want to assign SR-IOV VFs from different NICs to a guest (as logical ports),
then we'll still end up using the "ovn-encap-ip" to encapsulate traffic from
different VFs. This means we'll end up using only one NIC on the
physical, thereby not maintaining the VF-PF association while also not using
all the physical NICs. It is possible to bond all the NICs and use the
bond IP as the "encap-ip", but bonding multiple NICs has its own limitations,
i.e. NICs supporting OVS flows offload don't work with bonding - this
severly undermines SR-IOV use with OVS (i.e. if all the processing needs
to be done in the host despite giving VFs to guests).
Note: The above uses a NIC that supports OVS with SR-IOV (e.g. Mellanox CX-5) which
uses a "representor" to plug in a VF to the OVS bridge.
This patch enables a list of comma separated IP addresses to be specified in
"ovn-encap-ip", thus allowing the node to be reached via any IP combined with the
"ovn-encap-type" - assuming physical routing allows that. Additionally, it also
introduces an way to specify the encap IP to be used for a logical port (so that
the VF-PF mapping is maintained when traversing the logical path over
a tunnel). A new "encap-ip" external_ids can be configured on an
Interface to indicate this.
On the SB these changes appear as an additional column in port_bindings
as "encap". The encap record for a port points to an encap record
on its chassis. If the port is not explicitly associated with an
encap-ip (using external_ids), the encap record is empty, which
means the preferred tunnel will be used to reach the port's chassis.
The intention is also to have no functional changes in the default case, i.e
when there is only one "ovn-encap-ip".
The changes have been tested with multiple encap-ip addresses, SR-IOV and
for backwards compatibality (in the case where there is only one ovn-encap-ip)
with an OVN SB that doesn't include these changes.
Ilya Maximets [Mon, 18 Feb 2019 15:35:02 +0000 (18:35 +0300)]
checkpatch: Escape range operators inside regex.
' -(' matches a single character in the range between ' ' (index 32)
and '(' (index 40). This leads to the false positive:
WARNING: Line lacks whitespace around operator
#445 FILE: ovsdb/monitor.c:573:
if (--mcs->n_refs == 0) {
Need to escape '-' to have a right behaviour.
This patch additionally escapes all other '-' chars in the similar
regexes and makes them be one per line to ease the review in case of
future changes.
Basic unit tests added.
CC: Joe Stringer <joe@ovn.org> Fixes: 0d7b16daea50 ("checkpatch: Check for infix operator whitespace.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Toms Atteka [Tue, 19 Feb 2019 18:55:02 +0000 (10:55 -0800)]
netlink: added check to prevent netlink attribute overflow
If enough large input is passed to odp_actions_from_string it can
cause netlink attribute to overflow.
Check for buffer size was added to prevent entering this function
and returning appropriate error code.
Basic manual testing was performed.
Reported-by: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12231 Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 12 Nov 2018 09:28:39 +0000 (12:28 +0300)]
netdev-dpdk: Flow validation refactoring.
* Dropped 'is_all_zero' function, which is equal to 'is_all_zeros'
from util.h .
* util.h added to includes. Includes re-sorted within their blocks.
(it's hard to figure out where to put new one if there is no order.)
Note: linux/if.h depends on sys/socket.h .
* 'ovs_u128_is_zero' used instead of manual checking of fields.
* Code simplified by using direct pointer to 'match->wc.masks'.
* 'sizeof's rewritten to be coding-style complient.