]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
5 years agoofproto: fix the bug of bucket counter is not updated
Li Wei [Wed, 20 Mar 2019 12:16:18 +0000 (20:16 +0800)]
ofproto: fix the bug of bucket counter is not updated

After inserting/removing a bucket, we don't update the bucket counter.
When we call ovs-ofctl dump-group-stats br-int, a panic happened.

Reproduce steps:
1. ovs-ofctl -O OpenFlow15 add-group br-int "group_id=1, type=select, selection_method=hash bucket=bucket_id=1,weight:100,actions=output:1"
2. ovs-ofctl insert-buckets br-int "group_id=1, command_bucket_id=last, bucket=bucket_id=7,weight:800,actions=output:1"
3. ovs-ofctl dump-group-stats br-int

gdb) bt
at ../sysdeps/posix/libc_fatal.c:175
ar_ptr=<optimized out>) at malloc.c:5049
group_id=<optimized out>, cb=cb@entry=0x55cab8fd6cd0 <append_group_stats>) at ofproto/ofproto.c:6790

Signed-off-by: solomon <liwei.solomon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-dpdk: Print netdev name for txq mapping.
Ilya Maximets [Tue, 5 Mar 2019 16:28:26 +0000 (19:28 +0300)]
netdev-dpdk: Print netdev name for txq mapping.

In case of reconfiguration while 'vhost_id' is not set yet,
there will be the meaningless message like:

    |netdev_dpdk|DBG|TX queue mapping for
    |netdev_dpdk|DBG| 0 -->  0

It's better to print the name of the netdev which is always set.

Additionally fixed possible splitting by other log messages and
missing space in the queue state message.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev-perf: Fix millisecond stats precision with slower TSC.
Ilya Maximets [Tue, 19 Mar 2019 11:08:20 +0000 (14:08 +0300)]
dpif-netdev-perf: Fix millisecond stats precision with slower TSC.

Unlike x86 where TSC frequency usually matches with CPU frequency,
another architectures could have much slower TSCs.
For example, it's common for Arm SoCs to have 100 MHz TSC by default.
In this case perf module will check for end of current millisecond
each 10K cycles, i.e 10 times per millisecond. This could be not
enough to collect precise statistics.
Fix that by taking current TSC frequency into account instead of
hardcoding the number of cycles.

CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: 79f368756ce8 ("dpif-netdev: Detailed performance stats for PMDs")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovn-ctl: Make sure OVN_RUNDIR is created for central nodes.
Han Zhou [Thu, 21 Mar 2019 23:06:31 +0000 (16:06 -0700)]
ovn-ctl: Make sure OVN_RUNDIR is created for central nodes.

When ovn-ctl tries to start ovsdb, it didn't ensure the rundir
(e.g. /var/run/openvswitch) exist, because it is not calling
start_daemon(). Usually, if OVS is started by ovs-ctl before
on the same node, the folder is created already. However, for
OVN central node, OVS is usually not needed. If the folder is
not created (it is common case when system restarted because
/var/run is usually tmpfs), ovn-ctl will fail to start ovsdb.
This patch always ensures the OVN_RUNDIR is created.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-ctl: Unify OVN_RUNDIR usage.
Han Zhou [Thu, 21 Mar 2019 23:06:30 +0000 (16:06 -0700)]
ovn-ctl: Unify OVN_RUNDIR usage.

In this script $rundir and $OVN_RUNDIR is used in a mixed way, which
can cause different folders used for different runtime files. This
patch unifies the usage to the correct one.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Fix sphinx BuildRequires on Fedora Rawhide
Timothy Redaelli [Fri, 22 Mar 2019 18:45:46 +0000 (19:45 +0100)]
rhel: Fix sphinx BuildRequires on Fedora Rawhide

On Fedora Rawhide only python3-sphinx is available, but currently
python2-sphinx is used.

This commit changes the BuildRequires for sphinx to use
/usr/bin/sphinx-build directly instead of python2-sphinx in order to make
it work on current Fedora Rawhide too.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-vsctl: Add datapath_type column to show command.
Ilya Maximets [Thu, 21 Mar 2019 10:56:47 +0000 (13:56 +0300)]
ovs-vsctl: Add datapath_type column to show command.

Sometimes it's unclear which datapath type is in use by particular
bridge. For example, if all the interfaces supported by both system
and netdev datapaths it needs a DB query or log analysis to find out
which 'datapath_type' is in use.
Another case is that it's hard to figure out if patch ports are really
connected to each other. They are definitely not connected if datapath
types of their bridges differs.

With this change non-default 'datapath_type's will be exposed to
'ovs-vsctl show' command, so it'll be easier to spot misconfiguration.

  $ ovs-vsctl show
  ...
      Bridge "br0"
          datapath_type: netdev
          Port "br0"
              Interface "br0"
                  type: internal
  ...

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoreconnect.c: Don't transition back to ACTIVE when forced to RECONNECT.
Han Zhou [Fri, 22 Mar 2019 20:41:05 +0000 (13:41 -0700)]
reconnect.c: Don't transition back to ACTIVE when forced to RECONNECT.

Currently, whenever there is activity on the session, the FSM is
transitioned to ACTIVE. However, this causes reconnect_force_reconnect()
failed to work once there are traffic received from remote after
transition to RECONNECT, it will skip the reconnection phase and directly
go back to ACTIVE for the old session. This patch fixes it so that
when FSM is in RECONNECT state, it doesn't transition back to ACTIVE
directly.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoifupdown.sh: Add missing "--may-exist" option
George Diamantopoulos [Thu, 21 Mar 2019 18:48:49 +0000 (20:48 +0200)]
ifupdown.sh: Add missing "--may-exist" option

The ifupdown.sh script passes the --may-exist option
to ovs-vsctl invocations in order for it to exit without failing
if the device to be added already exists. This holds true for
all cases of adding objects to ovs-vswitchd except for when
configuring a bond interface.

This patch adds the --may-exist option to the missing
statement, which suppresses the logging of such errors in
syslog.

Additionally, running the unpatched version of this script when
the bond interface already exists appears to break
networking with some versions of ifupdown found in debian
testing (0.8.35), where the service won't start up properly
because of the aforementioned errors.

Signed-off-by: George Diamantopoulos <georgediam@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agopython: Monitor Database table to manage lifecycle of IDL client.
Ted Elhourani [Fri, 25 Jan 2019 19:10:01 +0000 (19:10 +0000)]
python: Monitor Database table to manage lifecycle of IDL client.

The Python IDL implementation supports ovsdb cluster connections.
This patch is a follow up to commit 31e434fc98, it adds the option of
connecting to the leader (the default) in the Raft-based cluster. It mimics
the exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.

The _Server database schema is first requested, then a monitor of the
Database table in the _Server Database. Method __check_server_db verifies
the eligibility of the server. If the attempt to obtain a monitor of the
_Server database fails and a cluster id was not provided this implementation
proceeds to request the data monitor. If a cluster id was provided via the
set_cluster_id method then the connection is aborted and a connection to a
different node is instead attempted, until a valid cluster node is found.
Thus, when supplied, cluster id is interpreted as the intention to only
allow connections to a clustered database. If not supplied, connections to
standalone nodes, or nodes that do not have the _Server database are
allowed. change_seqno is not incremented in the case of Database table
updates.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ted Elhourani <ted.elhourani@nutanix.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agopython: Fix package requirements with old setuptools
Timothy Redaelli [Fri, 22 Mar 2019 14:02:14 +0000 (15:02 +0100)]
python: Fix package requirements with old setuptools

Commit 00fcc832d598 ("Update Python package requirements") added a
PEP 508 environment marker to install pywin32 on Windows systems.

This requires a new setuptools version (>= 20.5), but (at least)
RHEL/CentOS7 and Debian Jessie are using an older version of
setuptools and so python extension failed to build.

This commit adds "extras_require" instead of the PEP 508 environment
markers in order to have the conditional dependency of pywin32, but by
remaining compatible with the old setuptools versions.

CC: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
CC: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Fixes: 00fcc832d598 ("Update Python package requirements")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agonetdev-rte-offloads: Add thread-safety notes.
Ilya Maximets [Wed, 20 Mar 2019 11:15:19 +0000 (14:15 +0300)]
netdev-rte-offloads: Add thread-safety notes.

DPDK_FLOW_OFFLOAD_API is not safe in a variety of ways.
This should be documented.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netlink: make offload failed EOPNOTSUPP and ENOSPC cases lower priority level log
wenxu [Tue, 19 Mar 2019 12:47:31 +0000 (20:47 +0800)]
dpif-netlink: make offload failed EOPNOTSUPP and ENOSPC cases lower priority level log

Offload flow failed for EOPNOTSUPP and ENOSPC which should not
be a err. It should e lower priority level log for this two
failure case.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agoAUTHORS: Add Roni Bar Yanai.
Ian Stokes [Tue, 19 Mar 2019 15:00:16 +0000 (15:00 +0000)]
AUTHORS: Add Roni Bar Yanai.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-rte-offloads: Rename netdev_dpdk_* functions
Ophir Munk [Tue, 5 Mar 2019 16:49:32 +0000 (16:49 +0000)]
netdev-rte-offloads: Rename netdev_dpdk_* functions

Rename all the netdev_dpdk_* functions names (originated from the file
netdev-dpdk.c) into the netdev_rte_offloads_* functions names.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dpdk: Move offloading code to a new file
Roni Bar Yanai [Tue, 5 Mar 2019 16:49:31 +0000 (16:49 +0000)]
netdev-dpdk: Move offloading code to a new file

Hardware offloading code is moved to a new file called
netdev-rte-offloads.c. The original offloading code is copied
from the netdev-dpdk.c file to the new file, where future
offloading code should be added as well.
The copied code was refactored based on coding style.
The netdev-dpdk.c file will remain unchanged as new offloading
code is added.

Co-authored-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Asaf Penso <asafp@mellanox.com>
Signed-off-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dpdk: Expose flow creation/destruction calls
Roni Bar Yanai [Tue, 5 Mar 2019 16:49:29 +0000 (16:49 +0000)]
netdev-dpdk: Expose flow creation/destruction calls

Before offloading code was added to the netdev-dpdk.c file (MARK and
RSS actions) the only DPDK RTE calls in use were rte_flow_create() and
rte_flow_destroy(). In preparation for splitting the offloading code
from the netdev-dpdk.c file to a separate file, it is required
to embed these RTE calls into a global netdev-dpdk-* API so that
they can be called from the new file. An example for this requirement
can be seen in the handling of dev->mutex, which should be encapsulated
inside netdev-dpdk class (netdev-dpdk.c file), and should be unknown
to the outside callers. This commit embeds the rte_flow_create() call
inside the netdev_dpdk_flow_create() API and the rte_flow_destroy()
call inside the netdev_dpdk_rte_flow_destroy() API.

Reviewed-by: Asaf Penso <asafp@mellanox.com>
Signed-off-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Co-authored-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev-perf: Fix double update of perf histograms.
Ilya Maximets [Mon, 18 Mar 2019 13:01:13 +0000 (16:01 +0300)]
dpif-netdev-perf: Fix double update of perf histograms.

Real values of 'packets per batch' and 'cycles per upcall' already
added to histograms in 'dpif-netdev' on receive. Adding the averages
makes statistics wrong. We should not add to histograms values that
never really appeared.

For exmaple, in current code following situation is possible:

  pmd thread numa_id 0 core_id 5:
  ...
    Rx packets:                  83  (0 Kpps, 13873 cycles/pkt)
    ...
    - Upcalls:                    3  (  3.6 %, 248.6 us/upcall)

  Histograms
    packets/it      pkts/batch       upcalls/it     cycles/upcall
    1         83    1         166    1         3    ...
                                                    15848     2
                                                    19952     2
                                                    ...
                                                    50118     2

i.e. all the packets counted twice in 'pkts/batch' column and
all the upcalls counted twice in 'cycles/upcall' column.

CC: Jan Scheurich <jan.scheurich@ericsson.com>
Fixes: 79f368756ce8 ("dpif-netdev: Detailed performance stats for PMDs")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpdk: Stop dumping memzones to stdout.
Ilya Maximets [Thu, 14 Mar 2019 14:43:48 +0000 (17:43 +0300)]
dpdk: Stop dumping memzones to stdout.

Information about memzones reserved on init is not much useful.
Anyway, we need to log it in more civilized manner, i.e. through
the OVS logging subsystem.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpctl: Drop parser debug information.
Ilya Maximets [Mon, 18 Mar 2019 11:02:30 +0000 (14:02 +0300)]
dpctl: Drop parser debug information.

This information is not that useful.
Anyway, no need to print it each time to the logs.

CC: Ben Pfaff <blp@ovn.org>
Fixes: d1fd1ea91242 ("ovs-dpctl: New --names option to use port names in flow dumps.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoodp-util: added NULL check for error pointer argument
Toms Atteka [Mon, 18 Mar 2019 19:11:48 +0000 (12:11 -0700)]
odp-util: added NULL check for error pointer argument

If NULL value was provided for odp_flow_from_string errorp argument
segmentation fault error occurred.

This patch fixes it by ignoring error formatting if error pointer
is not provided.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12972
Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-tc-offloads: Properly get the block id on flow del/get
Or Gerlitz [Sun, 17 Mar 2019 14:13:25 +0000 (16:13 +0200)]
netdev-tc-offloads: Properly get the block id on flow del/get

Currnetly, when a tc flow is installed on a bond port using shared blocks,
we get these failures from the validator threads:

2019-03-17T10:02:58.919Z|13369|dpif(revalidator93)|WARN|system@ovs-system: failed to flow_del \
(No such file or directory) ufid:ebe2888b-9886-4835-a42e-c2911f6af6e8 skb_priority(0),skb_mark(0),in_port(2), \
packet_type(ns=0,id=0),eth(src=e4:11:22:33:44:71,dst=24:8a:07:88:28:12),eth_type(0x0806), [..]

The block id must be retrieved from the device we got by ufid lookup and
not from the input to the related function, fix that for flow del and get.

While here, add the block id to existing debug print.

Fixes: 88dcf2aa8234 ('netdev-provider: add class op to get block_id')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agonetdev-tc-offloads: Improve log message for icmpv6 offload not supported
Moshe Levi [Thu, 28 Feb 2019 19:29:10 +0000 (21:29 +0200)]
netdev-tc-offloads: Improve log message for icmpv6 offload not supported

Signed-off-by: Moshe Levi <moshele@mellanox.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agomanpages: Highlight --ct-next option.
Ilya Maximets [Tue, 12 Mar 2019 12:52:52 +0000 (15:52 +0300)]
manpages: Highlight --ct-next option.

This makes it look like other options.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofp-protocol: Changed the number of bits in OFPUTIL_P_ANY from 10 to 9.
Ashish Varma [Wed, 13 Mar 2019 18:31:05 +0000 (11:31 -0700)]
ofp-protocol: Changed the number of bits in OFPUTIL_P_ANY from 10 to 9.

The removal of support for OpenFlow 1.6 (draft) resulted in the removal of
"OFPUTIL_P_OF16_OXM 1 << 9". OFPUTIL_P_ANY which represets all protocols will
now have only 9 valid bits.

Fixes: 29718ad49d61 ("Remove support for OpenFlow 1.6 (draft).")
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Replace structure copy by memcpy().
Darrell Ball [Fri, 15 Mar 2019 22:01:20 +0000 (15:01 -0700)]
conntrack: Replace structure copy by memcpy().

There are a few cases where structure copy can be replaced by
memcpy(), for possible portability benefit.  This is because
the structures involved have padding and elements of the
structure are used to generate hashes.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Lookup only 'UNNAT conns' in 'nat_clean()'.
Darrell Ball [Fri, 15 Mar 2019 22:01:19 +0000 (15:01 -0700)]
conntrack: Lookup only 'UNNAT conns' in 'nat_clean()'.

When freeing 'UNNAT conns', lookup only 'UNNAT conns' to
protect against possible address overlap with 'default
conns' during a DOS attempt.  This is very unlikely, but
protection is simple.

Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Fix race for NAT cleanup.
Darrell Ball [Fri, 15 Mar 2019 22:01:18 +0000 (15:01 -0700)]
conntrack: Fix race for NAT cleanup.

Reference lists are not fully protected during cleanup of
NAT connections where the bucket lock is transiently not held during
list traversal.  This can lead to referencing freed memory during
cleaning from multiple contexts.  Fix this by protecting with
the existing 'cleanup' mutex in the missed cases where 'conn_clean()'
is called.  'conntrack_flush()' is converted to expiry list traversal
to support the proper bucket level protection with the 'cleanup' mutex.

The NAT exhaustion case cleanup in 'conn_not_found()' is also modified
to avoid the same issue.

Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Reported-by: solomon <liwei.solomon@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-March/357056.html
Tested-by: solomon <liwei.solomon@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodpctl: Stop showing the dpctl/help command.
Ilya Maximets [Fri, 15 Mar 2019 14:06:01 +0000 (17:06 +0300)]
dpctl: Stop showing the dpctl/help command.

'dpctl/help' command is not registered and could not be called.
However, 'dpctl/list-commands' prints it as available.

CC: Ben Pfaff <blp@ovn.org>
Fixes: 337c45285445 ("dpctl: Fix jump through wild pointer in "dpctl/help".")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Sharon Krendel.
Ben Pfaff [Fri, 15 Mar 2019 02:21:19 +0000 (19:21 -0700)]
AUTHORS: Add Sharon Krendel.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-linux: netem QoS support
Sharon K [Thu, 14 Mar 2019 23:02:24 +0000 (01:02 +0200)]
netdev-linux: netem QoS support

Signed-off-by: Sharon Krendel <thekafkaf@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotreewide: Clean up inclusions of netdev-dpdk header.
Ilya Maximets [Mon, 4 Mar 2019 10:35:30 +0000 (13:35 +0300)]
treewide: Clean up inclusions of netdev-dpdk header.

'netdev-dpdk.h' provides only 'netdev_dpdk_register' and
'free_dpdk_buf' which are not used in these files and should
not be used.
Leftovers from the already removed code.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovn-nbctl: Don't segfault when ovn-northd doesn't configure dynamic addresses.
Justin Pettit [Mon, 4 Mar 2019 22:28:58 +0000 (14:28 -0800)]
ovn-nbctl: Don't segfault when ovn-northd doesn't configure dynamic addresses.

When ovn-nbctl is used to configure a logical switch port's addresses, it
does a sanity-check to make sure that a duplicate address isn't being
used.  If a port is configured as "dynamic", ovn-northd is supposed to
populate the "dynamic_addresses" column in the Logical_Switch_Port
table.  If it isn't ovn-nbctl, would dereference a null pointer as part
of the duplicate address check.  This patch checks that "dynamic_addresses"
is actually set first.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agodpif-netdev.at: Add basic test for partial HW offloading.
Ilya Maximets [Tue, 26 Feb 2019 10:38:43 +0000 (13:38 +0300)]
dpif-netdev.at: Add basic test for partial HW offloading.

Simple test for basic partial HWOL functionality.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dummy: Add flow offloading related logs.
Ilya Maximets [Tue, 26 Feb 2019 10:38:42 +0000 (13:38 +0300)]
netdev-dummy: Add flow offloading related logs.

Add debug logging for partial HWOL for dummy interfaces for
the future using in tests.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dummy: Set flow mark for offloaded flows.
Ilya Maximets [Tue, 26 Feb 2019 10:38:41 +0000 (13:38 +0300)]
netdev-dummy: Set flow mark for offloaded flows.

Match packets received on dummy interfaces with offloaded flows and
set up corresponding marks in dp-packet.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dummy: Implement dummy put/del flow offload API.
Ilya Maximets [Tue, 26 Feb 2019 10:38:40 +0000 (13:38 +0300)]
netdev-dummy: Implement dummy put/del flow offload API.

Basic partial HWOL API for dummy interfaces.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodp-packet: Copy flow mark on packet clone.
Ilya Maximets [Tue, 26 Feb 2019 10:38:39 +0000 (13:38 +0300)]
dp-packet: Copy flow mark on packet clone.

Dummy interfaces clones dp-packet while 'receive' appctl processing.
In general, we should do this anyway to avoid any possible issues in
the future with real interfaces.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodp-packet: Add flow_mark support for non-DPDK case.
Ilya Maximets [Tue, 26 Feb 2019 10:38:38 +0000 (13:38 +0300)]
dp-packet: Add flow_mark support for non-DPDK case.

Additionally, new API call 'dp_packet_set_flow_mark' is needed
for packet clone. Mostly for dummy HWOL implementation.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodatapath-windows: Add annotations to find vport functions
Alin Gabriel Serdean [Wed, 27 Feb 2019 17:26:46 +0000 (19:26 +0200)]
datapath-windows: Add annotations to find vport functions

Add annotations to find vport functions to check if the dispatch lock is
held.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
5 years agodatapath-windows: Guard vport usage in user.c
Alin Gabriel Serdean [Wed, 27 Feb 2019 14:03:03 +0000 (16:03 +0200)]
datapath-windows: Guard vport usage in user.c

When using a vport we need to guard its usage with the dispatch lock.

Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
5 years agofaq: Update features supported on Hyper-V
Anand Kumar [Mon, 11 Mar 2019 20:47:00 +0000 (13:47 -0700)]
faq: Update features supported on Hyper-V

These features were added a while back, so updating
the documentation.

Signed-off-by: Anand Kumar <kumaranand@vmware.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agodatapath-windows: Fix race condition during port creation
Sairam Venugopal [Tue, 26 Feb 2019 22:53:35 +0000 (14:53 -0800)]
datapath-windows: Fix race condition during port creation

Hold the dispatch lock until port-add operations are completed.

Found by inspection.

Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agodatapath-windows: Fix potential deadlock in event subscription
Sairam Venugopal [Wed, 27 Feb 2019 00:45:10 +0000 (16:45 -0800)]
datapath-windows: Fix potential deadlock in event subscription

Move the EventQueue lock acquisition after the dispatchLock to prevent a
potential deadlock in port creation pipeline. There could be a case where
a port event could try to take up the Dispatch Lock before the Event Queue
lock and the subscription queue event could take up the event queue lock
before the dispatch lock.

Found while testing with Driver Verifier enabled.

Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agodatapath-windows: Fix nbl cleanup when memory allocation fails
Sairam Venugopal [Fri, 8 Mar 2019 21:22:33 +0000 (13:22 -0800)]
datapath-windows: Fix nbl cleanup when memory allocation fails

StartNblIngressError should be called only when an NBL hasn't been
modified. In this case the nbl context was initialized. Rely on existing
packet completion mechanism to cleanup the NBL.

Found while testing with DriverVerifier with limited memory setting
enabled.

Signed-off-by: Sairam Venugopal <vsairam@vmware.com>
Acked-by: Anand Kumar <kumaranand@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
5 years agodp-packet: Refactor offloading API.
Ilya Maximets [Tue, 26 Feb 2019 10:38:37 +0000 (13:38 +0300)]
dp-packet: Refactor offloading API.

1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
   decapsulation or sending to 'ring' ports.

In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.

Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-linux: Remove ingress qdisc before trying to add shared block
Roi Dayan [Mon, 11 Mar 2019 12:47:08 +0000 (14:47 +0200)]
netdev-linux: Remove ingress qdisc before trying to add shared block

Adding shared ingress block with ingress qdisc already exists results
in a failure. So remove the ingress qdisc first.
Also while at it log the slave name.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agonetdev-tc-offloads: Remove ingress qdisc on tc init flow api
Roi Dayan [Mon, 11 Mar 2019 14:34:05 +0000 (16:34 +0200)]
netdev-tc-offloads: Remove ingress qdisc on tc init flow api

It could be a port added to ovs bridge already has ingress qdisc
which will make the block probe fail.
The probes should start clean and ingress is being added later
so just remove ingress in case it exists.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agoovsdb-idl: Fix memory leak of ovsdb_idl_db_clear.
Han Zhou [Wed, 6 Mar 2019 02:16:50 +0000 (18:16 -0800)]
ovsdb-idl: Fix memory leak of ovsdb_idl_db_clear.

ovsdb_idl_row_destroy() doesn't free the memory of row structure itself.
This is because of the ovsdb change tracking feature: the deleted row
may be accessed in the current iteration of main loop. The function
ovsdb_idl_row_destroy_postprocess() is called at the end of
ovsdb_idl_run() to free the deleted rows that are not tracked; the
function ovsdb_idl_db_track_clear() is called (indirectly) by user
at the end of each main loop iteration to free the deleted rows that
are tracked. However, in ovsdb_idl_db_clear(), which may be called when
a session is reset, or when the idl is destroyed, it didn't call
ovsdb_idl_row_destroy_postprocess(), which would result in all the
untracked rows leaked. This patch fixes that.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb raft: Precheck prereq before proposing commit.
Han Zhou [Fri, 1 Mar 2019 18:56:37 +0000 (10:56 -0800)]
ovsdb raft: Precheck prereq before proposing commit.

In current OVSDB Raft design, when there are multiple transactions
pending, either from same server node or different nodes in the
cluster, only the first one can be successful at once, and following
ones will fail at the prerequisite check on leader node, because
the first one will update the expected prerequisite eid on leader
node, and the prerequisite used for proposing a commit has to be
committed eid, so it is not possible for a node to use the latest
prerequisite expected by the leader to propose a commit until the
lastest transaction is committed by the leader and updated the
committed_index on the node.

Current implementation proposes the commit as soon as the transaction
is requested by the client, which results in continously retry which
causes high CPU load and waste.

Particularly, even if all clients are using leader_only to connect to
only the leader, the prereq check failure still happens a lot when
a batch of transactions are pending on the leader node - the leader
node proposes a batch of commits using the same committed eid as
prerequisite and it updates the expected prereq as soon as the first
one is in progress, but it needs time to append to followers and wait
until majority replies to update the committed_index, which results in
continously useless retries of the following transactions proposed by
the leader itself.

This patch doesn't change the design but simplely pre-checks if current
eid is same as prereq, before proposing the commit, to avoid waste of
CPU cycles, for both leader and followers. When clients use leader_only
mode, this patch completely eliminates the prereq check failures.

In scale test of OVN with 1k HVs and creating and binding 10k lports,
the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost
reduction on followers. (The test was with leader election base time
set to 10000ms, because otherwise the test couldn't complete because
of the frequent leader re-election.)

This is just one of the related performance problems of the prereq
checking mechanism dicussed at:

https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add support for DHCP option 150 - TFTP server address
Lucas Alvares Gomes [Thu, 7 Mar 2019 16:28:35 +0000 (16:28 +0000)]
OVN: Add support for DHCP option 150 - TFTP server address

OpenStack Ironic relies on a few DHCP options [0] that were not
supported in OVN yet. This patch is adding the last one which is the
option 150 (TFTP server address, RFC5859 [1]).

Note that this option is Cisco proprietary, the IEEE standard that
matches with this requirement is Option 66. The difference is that 150
allows to multiple IPs to be specified and 66 only allows one.

[0]
https://github.com/openstack/ironic/blob/3f6d4c6a789b12512d6cc67cdbc93ba5fbf29848/ironic/common/pxe_utils.py#L44-L54
[1] https://tools.ietf.org/html/rfc5859

Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add support for DHCP option 210 - path prefix
Lucas Alvares Gomes [Thu, 7 Mar 2019 16:28:34 +0000 (16:28 +0000)]
OVN: Add support for DHCP option 210 - path prefix

OpenStack Ironic relies on few DHCP options [0] that are not yet supported
in OVN, one of them is the 210 (PATH PREFIX, RFC5071 [1]).

[0]
https://github.com/openstack/ironic/blob/3f6d4c6a789b12512d6cc67cdbc93ba5fbf29848/ironic/common/pxe_utils.py#L44-L54
[1] https://tools.ietf.org/html/rfc5071#section-5

Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl: Fix memory leak of idl->remote.
Han Zhou [Wed, 6 Mar 2019 17:01:21 +0000 (09:01 -0800)]
ovsdb-idl: Fix memory leak of idl->remote.

Reported by Address Sanitizer.

Fixes: 5e07b8f93f03 ("ovsdb-idl: New function ovsdb_idl_create_unconnected().")
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agometa-flow.xml: Fix typos of flow-based tunnel command examples.
Han Zhou [Thu, 7 Mar 2019 06:04:50 +0000 (22:04 -0800)]
meta-flow.xml: Fix typos of flow-based tunnel command examples.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Remove 'sudo' configuration.
Ilya Maximets [Wed, 6 Mar 2019 15:41:00 +0000 (18:41 +0300)]
travis: Remove 'sudo' configuration.

Since TravisCI migrated jobs from containers to VMs, 'sudo' is always
available. Setting 'sudo: false' is misleading because it makes no
effect.

https://docs.travis-ci.com/user/reference/trusty/#container-based-infrastructure

 "Container-based infrastructure is currently being deprecated.
  Please remove any sudo: false keys in your .travis.yml file to use
  the default fully-virtualized Linux infrastructure instead."

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add port addresses to IPAM after all ports are joined.
Mark Michelson [Wed, 6 Mar 2019 14:33:04 +0000 (09:33 -0500)]
OVN: Add port addresses to IPAM after all ports are joined.

Joining ports involves setting the peer field on ovn_ports. If a switch
port is visited, and it is connected to a router port, then the switch
port's peer is set to the router port and the router port's peer is set
to the switch port.

A router port's addresses are added to IPAM if it is peered with a
switch that has dynamic addressing enabled.

When visiting ports, if a router port is visited before its connected
switch port, then the router port's peer is not set yet. Therefore the
router's port addresses cannot be added to IPAM. The result is that
duplicate addresses can be assigned by a logical switch.

The fix for this is to wait until all ports have been joined and then
add port addresses to IPAM. This way, we guarantee that all peer
assignments have been set, and no duplicate IP addresses may be assigned
by a switch.

Reported-by: James Page <james.page@canonical.com>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodpif-netlink: Free leaked ofpbuf by using ofpbuf_delete
Yifeng Sun [Tue, 5 Mar 2019 23:27:01 +0000 (15:27 -0800)]
dpif-netlink: Free leaked ofpbuf by using ofpbuf_delete

Found by valgrind.

256 bytes in 4 blocks are definitely lost in loss record 319 of 348
    by 0x52E204: xmalloc (util.c:123)
    by 0x4F6172: ofpbuf_new (ofpbuf.c:151)
    by 0x53DEF2: dpif_netlink_ct_get_limits (dpif-netlink.c:2951)
    by 0x587881: dpctl_ct_get_limits (dpctl.c:1904)
    by 0x58566F: dpctl_unixctl_handler (dpctl.c:2589)
    by 0x52D660: process_command (unixctl.c:308)
    by 0x52D660: run_connection (unixctl.c:342)
    by 0x52D660: unixctl_server_run (unixctl.c:393)
    by 0x407366: main (ovs-vswitchd.c:126)

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: select a random mac_prefix if not provided
Lorenzo Bianconi [Tue, 5 Mar 2019 13:22:50 +0000 (14:22 +0100)]
OVN: select a random mac_prefix if not provided

Select a random IPAM mac_prefix if it has not been provided by the user.
With this patch the admin can avoid to configure mac_prefix in order to
avoid L2 address collisions if multiple OVN deployments share the same
broadcast domain.
Remove MAC_ADDR_PREFIX definitions/occurrences since now mac_prefix is
always provided to ovn-northd

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Miguel Duarte de Mora Barroso <mdbarroso@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add parameswaran krishnamurthy.
Ben Pfaff [Tue, 5 Mar 2019 23:18:23 +0000 (15:18 -0800)]
AUTHORS: Add parameswaran krishnamurthy.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofproto: Fix for ovs-vswitchd crash on flow-mod with unsupported action
parameswaran krishnamurthy [Tue, 5 Mar 2019 12:45:54 +0000 (18:15 +0530)]
ofproto: Fix for ovs-vswitchd crash on flow-mod with unsupported action

Problem Description:
The ovs-vswitchd is crashing while invoking flow-mod with upsupported
action(Tested with ovs2.10.1)

Steps to recreate:
Step 1) Create a flow
ovs-ofctl add-flow switch1
priority=228,dl_type=0x0800,dl_vlan="600",in_port=25,actions=output:ALL
This step is successful.

Step 2) Invoke flow-mod with incorrect contents.
ovs-ofctl mod-flows switch1
priority=228,dl_type=0x0800,dl_vlan="600",in_port=25,actions=output:ALL,mod_vlan_vid:50,mod_vlan_pcp=6,mod_nw_tos=16

In the above example, the ofproto provider I have, will return error for
rule_construct as set_fields come after Output.

However the OVS is ignoring the error (The return value of add_flow_init
is ignored in modify_flow_init_strict) and eventually the ovs-vswitched
crashes.

Crash backtrace:
-----------------------

Thread 1 "ovs-vswitchd" received signal SIGSEGV, Segmentation fault.

 0x00007f6a06e785fb in modify_flows_start__ (
   ofproto=ofproto@entry=0x55b289cecc28, ofm=ofm@entry=0x7ffdf7d57b70)
     at ofproto/ofproto.c:5402
 5402    in ofproto/ofproto.c
 (gdb) bt
 #0  0x00007f6a06e785fb in modify_flows_start__ (
   ofproto=ofproto@entry=0x55b289cecc28, ofm=ofm@entry=0x7ffdf7d57b70)
    at ofproto/ofproto.c:5402
 #1  0x00007f6a06e790db in modify_flows_start_loose (ofm=0x7ffdf7d57b70,
    ofproto=0x55b289cecc28) at ofproto/ofproto.c:5443
 #2  ofproto_flow_mod_start (ofproto=ofproto@entry=0x55b289cecc28,
     ofm=ofm@entry=0x7ffdf7d57b70) at ofproto/ofproto.c:7672
 #3  0x00007f6a06e79164 in handle_flow_mod__ (
    ofproto=ofproto@entry=0x55b289cecc28, fm=fm@entry=0x7ffdf7d57d20,
    req=req@entry=0x7ffdf7d57cd0) at ofproto/ofproto.c:5858
 #4  0x00007f6a06e792c2 in handle_flow_mod (ofconn=ofconn@entry
 =0x55b289d528c0,
oh=oh@entry=0x55b289d5a410) at ofproto/ofproto.c:5835
 #5  0x00007f6a06e7a173 in handle_openflow__ (msg=0x55b289d351d0,
    ofconn=0x55b289d528c0) at ofproto/ofproto.c:8127
 #6  handle_openflow (ofconn=0x55b289d528c0, ofp_msg=0x55b289d351d0)
  at ofproto/ofproto.c:8296
 #7  0x00007f6a06e6a013 in ofconn_run (
  handle_openflow=0x7f6a06e796f0 <handle_openflow>,
 ofconn=0x55b289d528c0)
 at ofproto/connmgr.c:1446
 #8  connmgr_run (mgr=0x55b289d14fe0,
 handle_openflow=handle_openflow@entry=0x7f6a06e796f0
handle_openflow>)
at ofproto/connmgr.c:365

With this fix, OVS does not crash.

Signed-off-by: Parameswaran Krishnamurthy <parkrish@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-actions.xml: Fix inconsistency in documentation of controller action.
Ben Pfaff [Mon, 4 Mar 2019 23:43:17 +0000 (15:43 -0800)]
ovs-actions.xml: Fix inconsistency in documentation of controller action.

"controller_id" was written as "controller-id" in one place.  This spells
it consistently.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb: Update NEWS for fast-resync feature.
Han Zhou [Mon, 4 Mar 2019 20:10:22 +0000 (12:10 -0800)]
ovsdb: Update NEWS for fast-resync feature.

This patch updates text in NEWS committed by 5832e6a, so that it is
easier to understand for end users.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb: Move trigger_run after storage_run and read_db.
Han Zhou [Fri, 1 Mar 2019 18:56:36 +0000 (10:56 -0800)]
ovsdb: Move trigger_run after storage_run and read_db.

Run triggers after storage_run and read_db to make sure new raft
updates are utilized in current iteration.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: update RA next_announce according to {min, max}_interval
Lorenzo Bianconi [Mon, 4 Mar 2019 16:14:14 +0000 (17:14 +0100)]
OVN: update RA next_announce according to {min, max}_interval

Update RA next_announce whenever min_interval and/or max_interval are
updated in sbrec_port_binding option. In the current implementation
if ipv6_ra_configs:send_periodic is set to true before setting
ipv6_ra_configs:{min,max}_interval, next_announce will be set using
default values and it will not be updated until we send the first IPv6
router advertisement

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agolib/tc: add ingress ratelimiting support for tc-offload
Pieter Jansen van Vuuren [Fri, 1 Feb 2019 10:19:32 +0000 (10:19 +0000)]
lib/tc: add ingress ratelimiting support for tc-offload

Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.

An example of what to expect:

OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100

Resulting TC filter:
filter protocol ip pref 1 matchall chain 0
filter protocol ip pref 1 matchall chain 0 handle 0x1
  not_in_hw
action order 1:  police 0x1 rate 5Mbit burst 125Kb mtu 64Kb
action drop/continue overhead 0b
        ref 1 bind 1 installed 3 sec used 3 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.200 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384  16384    60.13       4.49

ovs-vsctl list interface <netdev_name>
_uuid               : 2ca774e8-8b95-430f-a2c2-f8f742613ab1
admin_state         : up
...
ingress_policing_burst: 100
ingress_policing_rate: 5000
...
type                : ""

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agodpdk: Fix case-sensitivity of dpdk-init knob.
Ilya Maximets [Fri, 1 Mar 2019 11:59:33 +0000 (14:59 +0300)]
dpdk: Fix case-sensitivity of dpdk-init knob.

Before supporting the DPDK initialization status in DB 'dpdk-init' was
just a boolean and 'smap_get_bool', which is case-insensitive, was used
to get the value.

Current code uses simple 'strcmp' that fails to recognize values like
"True". As a result this breaks different OVS configuration tools.
For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS
is not able to recognize it leading to broken installations.

'strcasecmp' should be used instead to fix the issue.

CC: Aaron Conole <aconole@redhat.com>
Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovsdb: Add NEWS for fast-resync feature.
Han Zhou [Sat, 2 Mar 2019 03:46:42 +0000 (19:46 -0800)]
ovsdb: Add NEWS for fast-resync feature.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorconn: Avoid occasional immediate connection failures.
Ben Pfaff [Fri, 1 Mar 2019 18:51:16 +0000 (10:51 -0800)]
rconn: Avoid occasional immediate connection failures.

The rconn connection timer measures time on the granularity of seconds,
which means that sometimes the actual timeout can be just a millsecond or
so, which led to occasional immediate connection failures from ovs-ofctl.

VMware-BZ: #2295760
Fixes: 476d2551abd2 ("rconn: Introduce new invariant to fix assertion failure in corner case.")
Reported-by: Ken Ajiro <ken-ajiro@xr.jp.nec.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Fix tests on mock and koji
Timothy Redaelli [Thu, 28 Feb 2019 15:55:11 +0000 (16:55 +0100)]
rhel: Fix tests on mock and koji

Currently many tests fails on mock/koji since /etc/resolv.conf is not
present. The unexpected warning causes them to abort.

After this patch an empty resolv.conf is created and used before issuing
"make check".

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Use PIDFile on forking systemd service files
Timothy Redaelli [Thu, 28 Feb 2019 17:27:46 +0000 (18:27 +0100)]
rhel: Use PIDFile on forking systemd service files

Currently, PIDFile is not used in systemd service files with
Type=forking. This means sometimes systemd fails to restart a daemon
that is killed (with SIGKILL) or that is crashed.

This commit adds PIDFile to all systemd service file with Type=forking
in order to always have the correct PID to monitor.

Reported-at: https://bugzilla.redhat.com/1653717
Reported-by: Candido Campos <ccamposr@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoNEWS: Clean up the 2.11.0 release notes a bit.
Justin Pettit [Thu, 28 Feb 2019 18:38:29 +0000 (10:38 -0800)]
NEWS: Clean up the 2.11.0 release notes a bit.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: limit stack size to 2M.
Flavio Leitner [Thu, 28 Feb 2019 16:13:57 +0000 (13:13 -0300)]
rhel: limit stack size to 2M.

The default stack size in Fedora/RHEL is 8M, which means when ovs-vswitchd
daemon starts and uses --mlockall (default), it will dirty all memory
regions for all threads which is proportionally to the number of CPUs.

On a big host this increases memory usage to many hundreds of megabytes
while OVS actually requires much less.

This patch relies on systemd to limit to 2M/thread. That is much more
than the minimum documented at function ovs_thread_create():

    /* Some small systems use a default stack size as small as 80 kB, but OVS
     * requires approximately 384 kB according to the following analysis:
     * https://mail.openvswitch.org/pipermail/ovs-dev/2016-January/308592.html
     *
     * We use 512 kB to give us some margin of error. */

Acked-By: Timothy Redaelli <tredaelli@redhat.com>
Tested-By: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Brian Haley.
Ben Pfaff [Thu, 28 Feb 2019 19:23:56 +0000 (11:23 -0800)]
AUTHORS: Add Brian Haley.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Make DHCP log messages unique
Brian Haley [Thu, 28 Feb 2019 18:06:50 +0000 (13:06 -0500)]
ovn: Make DHCP log messages unique

Two messags were using the same string, add info to one
to make it unique.  Also cleaned-up some of the others
to make them consistent throughout.

Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl: Fast resync from server when connection reset.
Han Zhou [Thu, 28 Feb 2019 17:15:20 +0000 (09:15 -0800)]
ovsdb-idl: Fast resync from server when connection reset.

Use monitor_cond_since to request changes after last version of local
data when connection to server is reset, without clearing the local
data. It falls back to clearing and repopulating all the data when
the requested id cannot be fulfilled by the server.

Test result at ovn-scale-test environment using clustered mode:
- 1K HVs (ovsdb clients)
- 10K lports

Without the patch it took 30+ min for the SB ovsdb-server to calm down
and HVs to stablize the connectin and finish syncing data.

With the patch there were no noticible CPU spike of SB ovsdb-server,
and all HVs were in sync with SB within 1 min, which is the probe
interval set in this test (so it took at most 1 min for HVs to notice
the TCP connection reset and reconnect and resync finished immediately
after that).

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-September/047457.html
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl: Support monitor_cond_since method in C IDL.
Han Zhou [Thu, 28 Feb 2019 17:15:19 +0000 (09:15 -0800)]
ovsdb-idl: Support monitor_cond_since method in C IDL.

Use monitor_cond_since in C IDL. If it is not supported by server,
fall back to old method (monitor_cond, or monitor).

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-monitor: Support monitor_cond_since.
Han Zhou [Thu, 28 Feb 2019 17:15:18 +0000 (09:15 -0800)]
ovsdb-monitor: Support monitor_cond_since.

Support the new monitor method monitor_cond_since so that a client
can request monitoring start from a specific point instead of always
from beginning. This will reduce the cost at scenarios when server
is restarted/failed-over but client still has all existing data. In
these scenarios only new changes (and in most cases no change) needed
to be transfered to client. When ovsdb-server restarted, history
transactions are read from disk file; when ovsdb-server failed over,
history transactions exists already in the memory of the new server.

There are situations that the requested transaction may not be found.
For example, a transaction that is too old and has been discarded
from the maintained history list in memory, or the transactions on
disk has been compacted during ovsdb compaction. In those situations
the server fall backs to transfer all data start from begining.

For more details of the protocol change, see
Documentation/ref/ovsdb-server.7.rst.

This change includes both server side and ovsdb-client side changes
with the new protocol. IDLs using this capability will be added in
future patches.

Now the feature takes effect only for cluster mode of ovsdb-server,
because cluster mode is the only mode that supports unique transcation
uuid today. For other modes, the monitor_cond_since always fall back
to transfer all data with found = false. Support for those modes can
be added in the future.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-server: Transaction history tracking.
Han Zhou [Thu, 28 Feb 2019 17:15:17 +0000 (09:15 -0800)]
ovsdb-server: Transaction history tracking.

Maintaining last N (n = 100) transactions in memory, which will be
used for future patches for generating monitor data from any point
in this N transactions.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-monitor: Refactor ovsdb monitor implementation.
Han Zhou [Thu, 28 Feb 2019 17:15:16 +0000 (09:15 -0800)]
ovsdb-monitor: Refactor ovsdb monitor implementation.

Current ovsdb monitor maintains its own transaction version through an
incremental integer and use it to identify changes starting from different
version, and also use it to figure out if each set of changes should be
flushed. In particular, it uses number 0 to represent that the change set
contains all data for initial client population.  It is a smart way but it
prevents further extension of the monitoring mechanism to support future use
case for clients to request changes starting from a given history point. This
patch refactors the structures so that change sets are referenced directly
through the pointer. It uses additional members such as init_change_set,
new_change_set to indicates the specific change set explicitely, instead of
through calculated version numbers based on implicite rules.

At the same time, this patch provides better encapsulation for change set
(composed of data in a list of tables), while still allowing traversing
across change sets for a given table.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agooss-fuzz: Fix oss build errors because of ovs API change
Yifeng Sun [Wed, 27 Feb 2019 22:21:00 +0000 (14:21 -0800)]
oss-fuzz: Fix oss build errors because of ovs API change

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13432
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodp-packet: Constantify offloading APIs.
Ilya Maximets [Tue, 26 Feb 2019 10:38:36 +0000 (13:38 +0300)]
dp-packet: Constantify offloading APIs.

Getters should have const arguments.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Reduce log level for not found mark id.
Ilya Maximets [Tue, 26 Feb 2019 10:38:35 +0000 (13:38 +0300)]
dpif-netdev: Reduce log level for not found mark id.

It's a normal case for 'find' function, especially because this
happens for every first packet of flow that was not offloaded yet.
Should not warn about this. Dropped to DBG to avoid log trashing in
case of big number of new flows.

CC: Yuanhan Liu <yliu@fridaylinux.org>
Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dpdk: Use single struct/union for flow offload items.
Ilya Maximets [Wed, 6 Feb 2019 15:40:36 +0000 (18:40 +0300)]
netdev-dpdk: Use single struct/union for flow offload items.

Having a single structure allows to simplify the code path and
clear all the items at once (probably faster). This does not
increase stack memory usage because all the L4 related items
grouped in a union.

Changes:
  - Memsets combined.
  - 'ipv4_next_proto_mask' dropped as we already know the address
    and able to use 'mask.ipv4.hdr.next_proto_id' directly.
  - Group of 'if' statements for L4 protocols turned to a 'switch'.
    We can do that, because we don't have semi-local variables anymore.
  - Eliminated 'end_proto_check' label. Not needed with 'switch'.

Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
sense to use 'rte_memcpy' for 6 bytes.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Asaf Penso <asafp@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agohash: Enable hash_bytes128 optimization for aarch64.
Yanqin Wei [Wed, 27 Feb 2019 09:44:06 +0000 (17:44 +0800)]
hash: Enable hash_bytes128 optimization for aarch64.

"hash_bytes128" has two versions for 64 bits and 32 bits system. This
should be common optimization for their respective platforms. But 64 bits
version was only enabled in x86-64. This patch enable it for aarch64
platform.
Micro benchmarking test was run in two kinds of arm platform. It was
observed that 50% performance improvement in thunderX2 and 40% improvement
in TaiShan(Cortex-A72).

Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-nbctl: Add lsp-get-ls command
Lucas Alvares Gomes [Wed, 27 Feb 2019 17:28:53 +0000 (17:28 +0000)]
ovn-nbctl: Add lsp-get-ls command

This commit adds the following command:

lsp-get-ls: Get the logical switch which the port belongs to.

This command is handy for scripting since there's no logical switch id
in the Logical_Switch_Port table.

Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Moshe Levi.
Ben Pfaff [Tue, 26 Feb 2019 13:51:37 +0000 (05:51 -0800)]
AUTHORS: Add Moshe Levi.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: update ovn-ctl usage with status, promote and demote commands
Moshe Levi [Tue, 26 Feb 2019 07:13:16 +0000 (09:13 +0200)]
ovn: update ovn-ctl usage with status, promote and demote commands

Signed-off-by: Moshe Levi <moshele@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Consolidate 2 selection statements.
Darrell Ball [Tue, 26 Feb 2019 00:37:50 +0000 (16:37 -0800)]
conntrack: Consolidate 2 selection statements.

No functional change.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Skip ephemeral ports with specified port range.
Darrell Ball [Mon, 25 Feb 2019 23:36:32 +0000 (15:36 -0800)]
conntrack: Skip ephemeral ports with specified port range.

This patch removes the fallback to ephemeral ports when a SNAT port
range is specified;  DNAT already does not fallback to ephemeral ports,
in general.  This is not restrictive to the user and makes it easier to
limit NAT L4 port selection.

The documentation is updated and a new test is added to enforce the
behavior.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-February/356607.html
Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoconntrack: Fix wasted work for ICMP NAT.
Darrell Ball [Mon, 25 Feb 2019 23:36:31 +0000 (15:36 -0800)]
conntrack: Fix wasted work for ICMP NAT.

ICMPv4 and ICMPv6 are not subject to port address translation (PAT),
however, a loop increments a local variable unnecessarily for
ephemeral ports, resulting in wasted work for ICMPv4 and ICMPv6 packets
subject to NAT.  Fix this by checking for PAT being enabled before
incrementing the local port variable and bail out otherwise.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoodp-util: Improve log messages and error reporting for Netlink parsing.
Ben Pfaff [Sat, 15 Dec 2018 02:16:55 +0000 (18:16 -0800)]
odp-util: Improve log messages and error reporting for Netlink parsing.

As a side effect, this also reduces a lot of log messages' severities from
ERR to WARN.  They just didn't seem like messages that in general reported
anything that would prevent functioning.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agovlog: Better handle syslog handler exceptions.
Ilya Maximets [Mon, 25 Feb 2019 17:43:36 +0000 (20:43 +0300)]
vlog: Better handle syslog handler exceptions.

'set_levels_from_string' doesn't check for exceptions that could
happen while opening syslog files or connecting to syslog sockets.

For example, if rsyslog stopped on a system:

  $ test-unixctl.py -vFACILITY:daemon --detach
  Traceback (most recent call last):
    File "../../../../tests/test-unixctl.py", line 90, in <module>
      main()
    File "../../../../tests/test-unixctl.py", line 61, in main
      ovs.vlog.handle_args(args)
    File "python/ovs/vlog.py", line 463, in handle_args
      msg = Vlog.set_levels_from_string(verbose)
    File "python/ovs/vlog.py", line 345, in set_levels_from_string
      Vlog.add_syslog_handler(words[1])
    File "python/ovs/vlog.py", line 321, in add_syslog_handler
      facility=syslog_facility)
    File "/python2.7/logging/handlers.py", line 759, in __init__
      self._connect_unixsocket(address)
    File "/python2.7/logging/handlers.py", line 787, in _connect_unixsocket
      self.socket.connect(address)
    File "/python2.7/socket.py", line 224, in meth
      return getattr(self._sock,name)(*args)
  socket.error: [Errno 111] Connection refused

In this case "/dev/log" file exists, so the check inside
'add_syslog_handler' doesn't help.

We need to catch the exceptions in 'set_levels_from_string' same way
as it done in 'init' function.
Also, we don't really need to check for '/dev/log' existence, because
exception will be catched on the upper layer and properly handled by
disabling the corresponding logger.

Fixes: d69d61c7c175 ("vlog: Ability to override the default log facility.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoipf: More cleanup.
Darrell Ball [Sat, 23 Feb 2019 02:48:46 +0000 (18:48 -0800)]
ipf: More cleanup.

No functional changes here.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoMerge pull request #276 from petrutlucian94/py_requirements
Alin Gabriel Serdean [Mon, 25 Feb 2019 13:50:26 +0000 (15:50 +0200)]
Merge pull request #276 from petrutlucian94/py_requirements

Update Python package requirements

5 years agoUpdate Python package requirements
Lucian Petrut [Fri, 22 Feb 2019 13:24:23 +0000 (15:24 +0200)]
Update Python package requirements

The Python ovs package relies on pywin32 for Windows support.
For this reason, pywin32 should be included in the requirements
list.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
5 years agoconntrack: Fix L4 csum for V6 extension hdr pkts.
Darrell Ball [Sat, 23 Feb 2019 01:17:42 +0000 (17:17 -0800)]
conntrack: Fix L4 csum for V6 extension hdr pkts.

It is a day one issue that got copied to subsequent code.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
CC: Daniele Di Proietto <diproiettod@ovn.org>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agopackets: Change return type for 'packet_csum_upperlayer6()'.
Darrell Ball [Sat, 23 Feb 2019 01:17:41 +0000 (17:17 -0800)]
packets: Change return type for 'packet_csum_upperlayer6()'.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Provide the option to set the datapath-type of br-int
Numan Siddique [Mon, 18 Feb 2019 04:42:22 +0000 (10:12 +0530)]
ovn-controller: Provide the option to set the datapath-type of br-int

If the integration bridge is deleted, ovn-controller recreates it
but the previous datapath-type value is lost if it was set. This
patch adds the code in ovn-controller to set the datapath-type
if it is configured by the user in the 'external_ids:ovn-bridge-datapath-type'
column of OpenvSwitch table.

Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Matthias May.
Ben Pfaff [Fri, 22 Feb 2019 23:13:25 +0000 (15:13 -0800)]
AUTHORS: Add Matthias May.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorstp: add ability to receive VLAN-tagged BPDUs
Matthias May [Thu, 14 Feb 2019 23:16:14 +0000 (00:16 +0100)]
rstp: add ability to receive VLAN-tagged BPDUs

There are switches which allow to transmit their BPDUs VLAN-tagged.
With this change OVS is able to receive VLAN-tagged BPDUs, but still
transmits its own BPDUs untagged.
This was tested against Westermo RFI-207-F4G-T3G.

Signed-off-by: Matthias May <matthias.may@neratec.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>