]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
4 years agoAUTHORS: update email for Lance Richardson
Lance Richardson [Wed, 8 Jan 2020 19:51:09 +0000 (14:51 -0500)]
AUTHORS: update email for Lance Richardson

Update email address for Lance Richardson.

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netdev: Get rid of broken dpif pointer in dp_netdev structure.
Ilya Maximets [Sun, 8 Dec 2019 17:51:09 +0000 (18:51 +0100)]
dpif-netdev: Get rid of broken dpif pointer in dp_netdev structure.

This pointer was introduced in July 2014 by commit
6b31e07347ad ("dpif-netdev: Polling threads directly call ofproto upcall functions.")
and it was broken right from this point because dpif_netdev_open()
updates it on each call with the pointer to a newly allocated
'dpif' structure that becomes invalid on the next dpif_netdev_close().
Since dpif_open/close() always happens asynchronously from different
threads and pointer is not protected by rcu or mutex (it's not even
atomic) it's not possible to safely use it.  Thankfully the actual
usage was in repository for less than 3 weeks and was removed by
commit 623540e4617e ("dpif-netdev: Streamline miss handling.").  Until
recently this pointer was used in order to pass it to dpif_flow_hash().
Another luck is that dpif_flow_hash() didn't use the 'dpif' argument.

However, we tried to use it while netdev offloading by commit
30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while offloading.")
and that unveiled the issue.

Now that all the code that used this pointer was cleaned up we can
just remove it from the structure to avoid possible misuse in the
future.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agodpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.
Ilya Maximets [Sun, 8 Dec 2019 17:09:53 +0000 (18:09 +0100)]
dpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.

Current implementation of dpif_flow_hash() doesn't depend on datapath
interface and only complicates the callers by forcing them to figure
out what is their current 'dpif'.  If we'll need different hashing
for different 'dpif's we'll implement an API for dpif-providers
and each dpif implementation will be able to use their local function
directly without calling it via dpif API.

This change will allow us to not store 'dpif' pointer in the userspace
datapath implementation which is broken and will be removed in next
commits.

This patch moves dpif_flow_hash() to odp-util module and replaces
unused odp_flow_key_hash() by it, along with removing of unused 'dpif'
argument.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb replication: Provide option to configure probe interval.
Numan Siddique [Tue, 7 Jan 2020 04:54:48 +0000 (10:24 +0530)]
ovsdb replication: Provide option to configure probe interval.

When ovsdb-server is in backup mode and connects to the active
ovsdb-server for replication, and if takes more than 5 seconds to
get the dump of the whole database, it will drop the connection
soon after as the default probe interval is 5 seconds. This
results in a snowball effect of reconnections to the active
ovsdb-server.

This patch handles or mitigates this issue by setting the
default probe interval value to 60 seconds and provide the option to
configure this value from the unixctl command.

Other option could be increase the value of 'RECONNECT_DEFAULT_PROBE_INTERVAL'
to a higher value.

Acked-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotests: introduced tests for adding/deleting logical routers in VTEP database
Damijan Skvarc [Mon, 23 Dec 2019 09:38:57 +0000 (10:38 +0100)]
tests: introduced tests for adding/deleting logical routers in VTEP database

New tests were introduced based on lcov report, which reveals apparent code
is not covered by ovs test suites.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agouserspace: Improved packet drop statistics.
Anju Thomas [Wed, 18 Dec 2019 04:48:12 +0000 (05:48 +0100)]
userspace: Improved packet drop statistics.

Currently OVS maintains explicit packet drop/error counters only on port
level.  Packets that are dropped as part of normal OpenFlow processing
are counted in flow stats of “drop” flows or as table misses in table
stats. These can only be interpreted by controllers that know the
semantics of the configured OpenFlow pipeline.  Without that knowledge,
it is impossible for an OVS user to obtain e.g. the total number of
packets dropped due to OpenFlow rules.

Furthermore, there are numerous other reasons for which packets can be
dropped by OVS slow path that are not related to the OpenFlow pipeline.
The generated datapath flow entries include a drop action to avoid
further expensive upcalls to the slow path, but subsequent packets
dropped by the datapath are not accounted anywhere.

Finally, the datapath itself drops packets in certain error situations.
Also, these drops are today not accounted for.This makes it difficult
for OVS users to monitor packet drop in an OVS instance and to alert a
management system in case of a unexpected increase of such drops.
Also OVS trouble-shooters face difficulties in analysing packet drops.

With this patch we implement following changes to address the issues
mentioned above.

1. Identify and account all the silent packet drop scenarios
2. Display these drops in ovs-appctl coverage/show

Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Co-authored-by: Keshav Gupta <keshugupta1@gmail.com>
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Signed-off-by: Keshav Gupta <keshugupta1@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoofproto-dpif-upcall: Fix using uninitialized upcall hash.
Ilya Maximets [Sat, 4 Jan 2020 00:07:36 +0000 (01:07 +0100)]
ofproto-dpif-upcall: Fix using uninitialized upcall hash.

upcalls are allocated on stack and 'hash' field must be initialized
regardless of attribute existence because it will be used later.

 Conditional jump or move depends on uninitialised value(s)
    at 0xFA74A7: dpif_netlink_encode_execute (dpif-netlink.c:1828)
    by 0xFA6DE8: dpif_netlink_operate__ (dpif-netlink.c:1906)
    by 0xFA612F: dpif_netlink_operate_chunks (dpif-netlink.c:2219)
    by 0xFA0E36: dpif_netlink_operate (dpif-netlink.c:2275)
    by 0xE5AFAC: dpif_operate (dpif.c:1376)
    by 0xDF3922: handle_upcalls (ofproto-dpif-upcall.c:1615)
    by 0xDF269B: recv_upcalls (ofproto-dpif-upcall.c:857)
    by 0xDF1C49: udpif_upcall_handler (ofproto-dpif-upcall.c:759)
    by 0xF3A3FE: ovsthread_wrapper (ovs-thread.c:383)
    by 0x565F6DA: start_thread (pthread_create.c:463)
    by 0x615988E: clone (clone.S:95)
  Uninitialised value was created by a stack allocation
    at 0xDF2258: recv_upcalls (ofproto-dpif-upcall.c:773)

Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agoofproto-dpif: Fix using uninitialized execute hash.
Ilya Maximets [Fri, 3 Jan 2020 23:32:04 +0000 (00:32 +0100)]
ofproto-dpif: Fix using uninitialized execute hash.

Most of callers doesn't initialize dpif_execute.hash leaving random
value from the stack.  And this random value used later while encoding
netlink message and might produce unwanted kernel behavior.

Fix that by fully initializing dpif_execute structure.  Using
designated initializers to avoid such issues in the future.

Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agotests: Allow running offloads testsuite under valgrind.
Ilya Maximets [Mon, 6 Jan 2020 10:54:07 +0000 (11:54 +0100)]
tests: Allow running offloads testsuite under valgrind.

This helps a lot with finding memory leaks and uninitialized
data usage.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netlink: Fix dumping uninitialized netdev flow stats.
Ilya Maximets [Mon, 6 Jan 2020 10:23:42 +0000 (11:23 +0100)]
dpif-netlink: Fix dumping uninitialized netdev flow stats.

dpif logging functions expects to be called after the operation.
log_flow_del_message() dumps flow stats on success which are not
initialized before the actual call to netdev_flow_del():

 Conditional jump or move depends on uninitialised value(s)
    at 0x6090875: _itoa_word (_itoa.c:179)
    by 0x6093F0D: vfprintf (vfprintf.c:1642)
    by 0x60C090F: vsnprintf (vsnprintf.c:114)
    by 0xE5E7EC: ds_put_format_valist (dynamic-string.c:155)
    by 0xE5E755: ds_put_format (dynamic-string.c:142)
    by 0xE5A5E6: dpif_flow_stats_format (dpif.c:903)
    by 0xE5B708: log_flow_message (dpif.c:1763)
    by 0xE5BCA4: log_flow_del_message (dpif.c:1809)
    by 0xFA6076: try_send_to_netdev (dpif-netlink.c:2190)
    by 0xFA0D3C: dpif_netlink_operate (dpif-netlink.c:2248)
    by 0xE5AFAC: dpif_operate (dpif.c:1376)
    by 0xDF176E: push_dp_ops (ofproto-dpif-upcall.c:2367)
    by 0xDF04C8: push_ukey_ops (ofproto-dpif-upcall.c:2447)
    by 0xDF008F: revalidator_sweep__ (ofproto-dpif-upcall.c:2805)
    by 0xDF5DC6: revalidator_sweep (ofproto-dpif-upcall.c:2816)
    by 0xDF1E83: udpif_revalidator (ofproto-dpif-upcall.c:949)
    by 0xF3A3FE: ovsthread_wrapper (ovs-thread.c:383)
    by 0x565F6DA: start_thread (pthread_create.c:463)
    by 0x615988E: clone (clone.S:95)
  Uninitialised value was created by a stack allocation
    at 0xDEFC24: revalidator_sweep__ (ofproto-dpif-upcall.c:2733)

Fixes: 3cd99886191e ("dpif-netlink: Use dpif logging functions")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Fix using uninitialized id chain.
Ilya Maximets [Fri, 3 Jan 2020 18:48:18 +0000 (19:48 +0100)]
tc: Fix using uninitialized id chain.

tc_make_tcf_id() doesn't initialize the 'chain' field leaving it with a
random value from the stack.  This makes request_from_tcf_id() create
request with random TCA_CHAIN included.  These requests are obviously
doesn't work as needed leading to broken flow dump and various other
issues.  Fix that by using designated initializer instead.

Fixes: acdd544c4c9a ("tc: Introduce tcf_id to specify a tc filter")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Fix using uninitialized recirc_act.
Ilya Maximets [Fri, 3 Jan 2020 19:06:32 +0000 (20:06 +0100)]
netdev-offload-tc: Fix using uninitialized recirc_act.

Fixes: b2ae40690ed7 ("netdev-offload-tc: Add recirculation support via tc chains")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: Include confirm_neigh parameter if needed
Greg Rose [Mon, 6 Jan 2020 21:36:34 +0000 (13:36 -0800)]
compat: Include confirm_neigh parameter if needed

A change backported to the Linux 4.14.162 LTS kernel requires
a boolean parameter.  Check for the presence of the parameter
and adjust the caller in that case.

Passes check-kmod test with no regressions.

Passes Travis build here:
https://travis-ci.org/gvrose8192/ovs-experimental/builds/633461320

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoAUTHORS: Add Krishna Kolakaluri.
Ben Pfaff [Mon, 6 Jan 2020 22:28:26 +0000 (14:28 -0800)]
AUTHORS: Add Krishna Kolakaluri.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agobridge: Split the column updates of rstp statistics and status.
Krishna Kolakaluri [Tue, 31 Dec 2019 00:19:39 +0000 (16:19 -0800)]
bridge: Split the column updates of rstp statistics and status.

Split the update of rstp_statistics column and rstp_status column in
Port table into two different functions.  This helps in controlling the
number of times the rstp_statistics column is updated with the key
"stats-update_interval" in Open_vSwitch table.

Signed-off-by: Krishna Kolakaluri <kkolakaluri@plume.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Fix transmission freeze in native mode without zerocopy.
Ilya Maximets [Sun, 5 Jan 2020 00:51:19 +0000 (01:51 +0100)]
netdev-afxdp: Fix transmission freeze in native mode without zerocopy.

Kernel uses 'xsk_generic_xmit()' for all modes where zerocopy is
not enabled:

   net/xdp/xsk.c
   433  static int __xsk_sendmsg(struct sock *sk)
   434  {
            ...
   442      return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
   443  }

'xsk_generic_xmit ()' sends packets synchronously and no more than 16
packets at a time.  This means that we have to kick Tx with sendmsg()
for every 16 packets in simple native mode too, otherwise the packets
may never be sent.

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/365076.html
Fixes: e8f5634484e8 ("netdev-afxdp: Best-effort configuration of XDP mode.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocirrus: Use python 3.7 packages on FreeBSD.
Ilya Maximets [Thu, 2 Jan 2020 10:57:53 +0000 (11:57 +0100)]
cirrus: Use python 3.7 packages on FreeBSD.

Python 3.6 versions of these packages are no longer available in
FreeBSD ports.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-offload-tc: Add conntrack nat support
Paul Blakey [Sun, 22 Dec 2019 10:16:43 +0000 (12:16 +0200)]
netdev-offload-tc: Add conntrack nat support

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Add conntrack label and mark support
Paul Blakey [Sun, 22 Dec 2019 10:16:42 +0000 (12:16 +0200)]
netdev-offload-tc: Add conntrack label and mark support

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Add conntrack support
Paul Blakey [Sun, 22 Dec 2019 10:16:41 +0000 (12:16 +0200)]
netdev-offload-tc: Add conntrack support

Zone and ct_state first.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Add recirculation support via tc chains
Paul Blakey [Sun, 22 Dec 2019 10:16:40 +0000 (12:16 +0200)]
netdev-offload-tc: Add recirculation support via tc chains

Each recirculation id will create a tc chain, and we translate
the recirculation action to a tc goto chain action.

We check for kernel support for this by probing OvS Datapath for the
tc recirc id sharing feature. If supported, we can offload rules
that match on recirc_id, and recirculation action safely.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Move tunnel_key unset action before output ports
Paul Blakey [Sun, 22 Dec 2019 10:16:39 +0000 (12:16 +0200)]
tc: Move tunnel_key unset action before output ports

Since OvS datapath gets packets already decapsulated from tunnel devices,
it doesn't explicitly decapsulate them. So in a recirculation setup,
the tunnel matching continues in the recirculation as the tunnel
metadata still exists on the SKB.

Tunnel key unset action unsets this metadata. Some drivers might rely
on this explicit tunnel key unset to know when to decapsulate the packet
instead of the device type. So instead of removing it completly,
we move it near the output actions.

This way, we also keep SKB metadata through recirculation, and for
non-recirculation rules, the resulting tc rules should remain the same.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif: Add support to set user features
Paul Blakey [Sun, 22 Dec 2019 10:16:38 +0000 (12:16 +0200)]
dpif: Add support to set user features

This enables user features on the kernel datapath via the DP_CMD_SET
command, and also retrieves them to check for actual support and
not just an older kernel ignoring the requested features.

This will be used in next patch to enable recirc_id sharing with tc.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Implement netdev tc flush via tc filter del
Paul Blakey [Sun, 22 Dec 2019 10:16:37 +0000 (12:16 +0200)]
netdev-offload-tc: Implement netdev tc flush via tc filter del

To be consistent with our tc-ufid mapping after flush, and to support tc
chains flushing in the next commit, implement flush operation via
deleting all the filters we actually added and delete their mappings.

This will also not delete the configured qos policing via matchall filters,
while old code did.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Introduce tcf_id to specify a tc filter
Paul Blakey [Sun, 22 Dec 2019 10:16:36 +0000 (12:16 +0200)]
tc: Introduce tcf_id to specify a tc filter

Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agocompat: Add tc ct action and flower matches defines for older kernels
Paul Blakey [Sun, 22 Dec 2019 10:16:35 +0000 (12:16 +0200)]
compat: Add tc ct action and flower matches defines for older kernels

Update kernel UAPI to support conntrack matches, and the
tc actions ct and goto chain.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agomatch: Add match_set_ct_zone_masked helper
Paul Blakey [Sun, 22 Dec 2019 10:16:34 +0000 (12:16 +0200)]
match: Add match_set_ct_zone_masked helper

Sets zone in match.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoovsdb raft: Fix the problem when cluster restarted after DB compaction.
Han Zhou [Wed, 4 Dec 2019 01:57:20 +0000 (17:57 -0800)]
ovsdb raft: Fix the problem when cluster restarted after DB compaction.

Cluster doesn't work after all nodes restarted after DB compaction,
unless there is any transaction after DB compaction before the restart.

Error log is like:
raft|ERR|internal error: deferred vote_request message completed but not ready
to send because message index 9 is past last synced index 0: s2 vote_request:
term=6 last_log_index=9 last_log_term=4

The root cause is that the log_synced member is not initialized when
reading the raft header. This patch fixes it and remove the XXX
from the test case.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-cluster.at: Wait until leader is elected before DB compact.
Han Zhou [Wed, 4 Dec 2019 01:57:19 +0000 (17:57 -0800)]
ovsdb-cluster.at: Wait until leader is elected before DB compact.

In test case "election timer change", before testing DB compact,
we had to insert some data. Otherwise, inserting data after DB
compact will cause busy loop as mentioned in the XXX comment.

The root cause of the busy loop is still not clear, but the test
itself didn't wait until the leader election finish before initiating
DB compact. This patch adds the wait to make sure the test continue
after leader is elected so that the following tests are based on
a clean state. While this wait is added, the busy loop problem is
gone even without inserting the data, so the additional data insertion
is also removed by this patch.

A separate patch will address the busy loop problem in the scenario:
1. Restart cluster
2. DB compact before the cluster is ready
3. Insert data

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs container build: Make kernel module configurable
Aliasgar Ginwala [Fri, 20 Dec 2019 00:50:47 +0000 (16:50 -0800)]
ovs container build: Make kernel module configurable

--with-linux can be made configurable while building containers
for leveraging kernel modules installed on host.
KERNEL_VERSION=host should be used in env variable for the same.

Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoRevert "docs: To build OVS on RHEL7 EPEL is needed"
Timothy Redaelli [Fri, 20 Dec 2019 17:35:09 +0000 (18:35 +0100)]
Revert "docs: To build OVS on RHEL7 EPEL is needed"

This reverts commit 9e334d91b3ea95e2b96f7b3edcb2ba9c3353288a.

This commit is not needed since OVS doesn't use six anymore.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoRemove dependency on python3-six
Timothy Redaelli [Fri, 20 Dec 2019 17:35:08 +0000 (18:35 +0100)]
Remove dependency on python3-six

Since Python 2 support was removed in 1ca0323e7c29 ("Require Python 3 and
remove support for Python 2."), python3-six is not needed anymore.

Moreover python3-six is not available on RHEL/CentOS7 without using EPEL
and so this patch is needed in order to release OVS 2.13 on RHEL7.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agotravis: Use pip3 to install the python packages on linux
Timothy Redaelli [Wed, 18 Dec 2019 15:53:10 +0000 (16:53 +0100)]
travis: Use pip3 to install the python packages on linux

Currently pip is used to install the python packages on linux by travis,
but pip3 should be used since pip is a symlink of pip2.

Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Cc: blp@ovn.org
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpctl: Fix referencing DPDK in a flow dump.
Ilya Maximets [Wed, 18 Dec 2019 14:38:29 +0000 (15:38 +0100)]
dpctl: Fix referencing DPDK in a flow dump.

Few reasons to replace 'non-dpdk interfaces' with 'the main thread':

* Flows are dumped from threads (from per thread flow tables) not from
  the interfaces.

* 'non-dpdk' here sounds like all other flows (dumped from PMDs) has
  some relation with DPDK which is not true at least because we have
  afxdp and dummy ports that could be polled by PMD threads.

* 'main thread' is the same term as we're using in the output of
  ovs-appctl dpif-netdev/pmd-stats-show.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-thread: Avoid huge alignment on a base spinlock structure.
Ilya Maximets [Mon, 16 Dec 2019 12:54:38 +0000 (13:54 +0100)]
ovs-thread: Avoid huge alignment on a base spinlock structure.

Marking the structure as 64 bytes aligned forces compiler to produce
big holes in the containing structures in order to fulfill this
requirement.  Also, any structure that contains this one as a member
automatically inherits this huge alignment making resulted memory
layout not efficient.  For example, 'struct umem_pool' currently
uses 3 full cache lines (192 bytes) with only 32 bytes of actual data:

  struct umem_pool {
    int                        index;                /*  0   4 */
    unsigned int               size;                 /*  4   4 */

    /* XXX 56 bytes hole, try to pack */

    /* --- cacheline 1 boundary (64 bytes) --- */
    struct ovs_spin lock __attribute__((__aligned__(64))); /* 64  64 */

    /* XXX last struct has 48 bytes of padding */

    /* --- cacheline 2 boundary (128 bytes) --- */
    void * *                   array;                /* 128  8 */

    /* size: 192, cachelines: 3, members: 4 */
    /* sum members: 80, holes: 1, sum holes: 56 */
    /* padding: 56 */
    /* paddings: 1, sum paddings: 48 */
    /* forced alignments: 1, forced holes: 1, sum forced holes: 56 */
  } __attribute__((__aligned__(64)));

Actual alignment of a spin lock is required only for Tx queue locks
inside netdev-afxdp to avoid false sharing, in all other cases
alignment only produces inefficient memory usage.

Also, CACHE_LINE_SIZE macro should be used instead of 64 as different
platforms may have different cache line sizes.

Using PADDED_MEMBERS to avoid alignment inheritance.

Fixes: ae36d63d7e3c ("ovs-thread: Make struct spin lock cache aligned.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agonetdev-dpdk: Add coverage counter to count vhost IRQs.
Eelco Chaudron [Thu, 12 Dec 2019 03:00:01 +0000 (22:00 -0500)]
netdev-dpdk: Add coverage counter to count vhost IRQs.

When the dpdk vhost library executes an eventfd_write() call,
i.e. waking up the guest, a new callback will be called.

This patch adds the callback to count the number of
interrupts sent to the VM to track the number of times
interrupts where generated.

This might be of interest to find out system-calls were
called in the DPDK fast path.

The coverage counter is called "vhost_notification" and
can be read with:

  $ ovs-appctl coverage/read-counter vhost_notification
  13238319

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-dpdk: Fix sw stats perf drop.
Kevin Traynor [Tue, 17 Dec 2019 15:07:37 +0000 (15:07 +0000)]
netdev-dpdk: Fix sw stats perf drop.

Accessing the sw stats in the vhost datapath of a PVP test
can incur a performance drop of ~2%.

Most of the time these stats will just be getting zero added
to them. By checking if there is a non-zero update first, we
can avoid accessing them when they won't be updated and avoid
the performance drop.

Fixes: 2f862c712e52 ("netdev-dpdk: Detailed packet drop statistics.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoDocumentation: Fix ovs-tcpdump options.
William Tu [Mon, 16 Dec 2019 14:18:25 +0000 (06:18 -0800)]
Documentation: Fix ovs-tcpdump options.

Signed-off-by: William Tu <u9012063@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agosystem-afxdp.at: Add test for infinite re-addition of failed ports.
Ilya Maximets [Sat, 7 Dec 2019 14:46:18 +0000 (15:46 +0100)]
system-afxdp.at: Add test for infinite re-addition of failed ports.

New file created for AF_XDP specific tests.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agonetdev-afxdp: Avoid removing of XDP program if not loaded.
Ilya Maximets [Sat, 7 Dec 2019 14:46:17 +0000 (15:46 +0100)]
netdev-afxdp: Avoid removing of XDP program if not loaded.

'bpf_set_link_xdp_fd' generates netlink event regardless of actual
changes it does, so if-notifier will receive link update even if
there was no XDP program previously loaded on the interface.

OVS tries to remove XDP program if device configuration was not
successful triggering if-notifier that triggers bridge reconfiguration
and another attempt to add failed port.  And so on in the infinite
loop.

This patch avoids the issue by not removing XDP program if it wasn't
loaded.  Since loading of the XDP program is one of the last steps
of port configuration, this should help to avoid infinite re-addition
for most types of misconfiguration.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agodpif-netdev: Avoid infinite re-addition of misconfigured ports.
Ilya Maximets [Sat, 7 Dec 2019 14:46:16 +0000 (15:46 +0100)]
dpif-netdev: Avoid infinite re-addition of misconfigured ports.

Infinite re-addition of failed ports happens if the device in userspace
datapath has a linux network interface and it's not able to be
configured.  For example, if the first reconfiguration fails because of
misconfiguration or bad initial device state.
In current code victims are afxdp ports and the Mellanox NIC ports
opened by the DPDK due to their bifurcated drivers (It's unlikely for
usual netdev-linux ports to fail).

The root cause: Every change in the state of the network interface
of a linux kernel device generates if-notifier event and if-notifier
event triggers the OVS code to re-apply the configuration of ports,
i.e. add broken ports back. The most obvious part is that dpif-netdev
changes the device flags before trying to configure it:

   1. add_port()
   2. set_flags() --> if-notifier event
   3. reconfigure() --> port removal from the datapath due to
                        misconfiguration or any other issue in
                        the underlying device.
   4. setting flags back --> another if-notifier event.
   5. There was new if-notifier event?
      yes --> re-apply all settings. --> goto step 1.

Easy way to reproduce is to add afxdp port with n_rxq=N, where N is
bigger than device supports.

This patch fixes the most obvious case for this issue by moving
enabling of a promisc mode later to the place where we already know
that device could be added to datapath without errors, i.e. after
its first successful reconfiguration.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363038.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agonetdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event.
Eelco Chaudron [Mon, 11 Nov 2019 14:02:37 +0000 (09:02 -0500)]
netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event.

Currently, OVS does not register and therefore not handle the
interface reset event from the DPDK framework. This would cause a
problem in cases where a VF is used as an interface, and its
configuration changes.

As an example in the following scenario the MAC change is not
detected/acted upon until OVS is restarted without the patch applied:

  $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
  $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
            set Interface dpdk0 type=dpdk -- \
            set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0

  $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agobridge: Allow manual notifications about interfaces' updates.
Ilya Maximets [Tue, 5 Nov 2019 17:20:51 +0000 (18:20 +0100)]
bridge: Allow manual notifications about interfaces' updates.

Sometimes interface updates could happen in a way ifnotifier is not
able to catch.  For example some heavy operations (device reset) in
netdev-dpdk could require re-applying of the bridge configuration.

For this purpose new manual notifier introduced. Its function
'if_notifier_manual_report()' could be called directly by the code
that aware about changes.  This new notifier is thread-safe.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
4 years agotests: introduced test for checking "ovs-vsctl emer-reset"
Damijan Skvarc [Mon, 9 Dec 2019 13:26:43 +0000 (14:26 +0100)]
tests: introduced test for checking "ovs-vsctl emer-reset"

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agorhel: Support RHEL 7.8 kernel module rpm build
Yi-Hung Wei [Thu, 5 Dec 2019 01:10:27 +0000 (17:10 -0800)]
rhel: Support RHEL 7.8 kernel module rpm build

This patch supports RHEL 7.8 kernel module rpm package building.

$ make rpm-fedora-kmod \
RPMBUILD_OPT='-D "kversion 3.10.0-1101.el7.x86_64"'

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocirrus: Use FreeBSD 12.1 stable release.
Ilya Maximets [Fri, 13 Dec 2019 12:52:10 +0000 (13:52 +0100)]
cirrus: Use FreeBSD 12.1 stable release.

freebsd-12-0-snap image family suddenly removed from the gCloud,
so can not be used anymore.  Updating to more recent 12.1 releases.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Lance Yang.
Ilya Maximets [Fri, 13 Dec 2019 17:49:00 +0000 (18:49 +0100)]
AUTHORS: Add Lance Yang.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agotravis: Move x86-only add-on packages to linux-prepare script.
Lance Yang [Fri, 6 Dec 2019 03:26:12 +0000 (11:26 +0800)]
travis: Move x86-only add-on packages to linux-prepare script.

To enable multiple CPU architectures support, it is necessary to move
the x86-only add-on packages from .travis.yml file. Otherwise, the
x86-only add-on packages will break the builds on some other CPU
architectures.

Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com>
Reviewed-by: Malvika Gupta <Malvika.Gupta@arm.com>
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
Acked-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodpif-netdev-perf: Fix using of uninitialized last_tsc.
Lance Yang [Fri, 6 Dec 2019 03:26:11 +0000 (11:26 +0800)]
dpif-netdev-perf: Fix using of uninitialized last_tsc.

When compiling Open vSwitch on aarch64, the compiler will warn about a
uninitialized variable in lib/dpif-netdev-perf.c. If the clock_gettime
function in rdtsc_syscall fails, the member last_tsc of the
uninitialized struct will be returned. In order to avoid the warnings,
it is necessary to initialize the variable before using.

Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com>
Reviewed-by: Malvika Gupta <Malvika.Gupta@arm.com>
Signed-off-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodpif-netdev: Hold global port mutex while calling offload API.
Ilya Maximets [Tue, 10 Dec 2019 12:36:16 +0000 (13:36 +0100)]
dpif-netdev: Hold global port mutex while calling offload API.

We changed datapath port lookup to netdev-offload API usage, but
forgot that port mutex was there not only to protect datapath
port hash map.  It was there also as a workaround solution for
complete unsafety of netdev-offload-dpdk functions.

Turning it back to fix the behaviour and adding a comment to prevent
removing it in the future unless netdev-offload-dpdk fixed.

For the thread safety notice see the top of netdev-offload-dpdk.c.

Fixes: 30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while offloading")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eli Britstein <elibr@mellanox.com>
4 years agotests: Log commands being executed for async message control test.
Ben Pfaff [Wed, 4 Dec 2019 23:06:11 +0000 (15:06 -0800)]
tests: Log commands being executed for async message control test.

The "ofproto - asynchronous message control (OpenFlow 1.4)" test fails
from time to time when I'm running tests in parallel locally.  So far,
I've not been able to determine the root cause, but logging the
commands as they're executed should help.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests: Improve logging for async message control test.
Ben Pfaff [Wed, 4 Dec 2019 23:06:10 +0000 (15:06 -0800)]
tests: Improve logging for async message control test.

The "ofproto - asynchronous message control (OpenFlow 1.4)" test fails
from time to time when I'm running tests in parallel locally.  So far,
I've not been able to determine the root cause, but logging the
difference between expected and actual output should help.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests: Better document OVS_WAIT_UNTIL, OVS_WAIT_WHILE macros.
Ben Pfaff [Wed, 4 Dec 2019 23:06:09 +0000 (15:06 -0800)]
tests: Better document OVS_WAIT_UNTIL, OVS_WAIT_WHILE macros.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoofp-monitor: Make OFP_FLOW_REMOVED_REASON_BUFSIZE public.
Ben Pfaff [Wed, 4 Dec 2019 23:06:08 +0000 (15:06 -0800)]
ofp-monitor: Make OFP_FLOW_REMOVED_REASON_BUFSIZE public.

This constant is needed to use ofp_flow_removed_reason_to_string(),
which is itself public.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoofp-print: Abbreviate lists of fields in table features output.
Ben Pfaff [Wed, 4 Dec 2019 23:06:07 +0000 (15:06 -0800)]
ofp-print: Abbreviate lists of fields in table features output.

This makes the output both shorter and easier to read.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocheckpatch: Check spelling in commit messages.
Ilya Maximets [Sat, 7 Dec 2019 17:14:00 +0000 (18:14 +0100)]
checkpatch: Check spelling in commit messages.

This seems useful as I'm usually making a lot of typing mistakes.
Also, few commonly used words added to the extended dictionary.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agocheckpatch: Skip words containing numbers.
Ilya Maximets [Sat, 7 Dec 2019 17:10:24 +0000 (18:10 +0100)]
checkpatch: Skip words containing numbers.

Words like 'br0' are common and usually references some code or
database objects that should not be targets for spell checking.
So, it's better to skip all the words that has digits inside instead
of ones that starts with numbers.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agocheckpatch: Allow common abbreviations for spell checking.
Ilya Maximets [Sat, 7 Dec 2019 17:02:01 +0000 (18:02 +0100)]
checkpatch: Allow common abbreviations for spell checking.

Abbreviations of Latin expressions like 'i.e.' or 'e.g.' are common
and known by the dictionary.  However, our spell checker is not able
to recognize them because it strips dots out of them.  To avoid this
issue we could pass non-stripped version of the word to the dictionary
checker too.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agodatapath-windows: Do not delete internal port on OID_SWITCH_NIC_DISCONNECT
Jinjun Gao [Sun, 8 Dec 2019 09:28:17 +0000 (09:28 +0000)]
datapath-windows: Do not delete internal port on OID_SWITCH_NIC_DISCONNECT

According to the microsoft doc:
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/hyper-v-extensible-switch-port-and-network-adapter-states
Below OID request sequence is validation:
         OID_SWITCH_NIC_CONNECT -> OID_SWITCH_NIC_DISCONNECT
                  ^                           |
                  |                           V
         OID_SWITCH_NIC_CREATE  <- OID_SWITCH_NIC_DELETE

In above sequence, the windows extensible switch interface assumes the
OID_SWITCH_PORT_CREATE has issued and the port has been created
successfully. If delete the internal port in HvDisconnectNic(),
HvCreateNic() will fail when received OID_SWITCH_NIC_CREATE late because
there is no corresponding port.

Signed-off-by: Jinjun Gao <jinjung@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
4 years agodpif-netdev: Retrieve dpif_class from struct dp_netdev.
Ophir Munk [Sun, 8 Dec 2019 14:29:14 +0000 (14:29 +0000)]
dpif-netdev: Retrieve dpif_class from struct dp_netdev.

In case a pmd pointer (struct dp_netdev_pmd_thread *) needs to retrieve
the dpif_class it points at - it can access it as:  pmd->dp->class.  A
second option is to access it as: pmd->dp->dpif->dpif_class. The first
option is safe since there is one dp netdev with a constant pointer to
the dpif class. The second option is not safe since the pointer
pmd->dp->dpif may be changed under the hood, for example, in case there
is a call to dpif_open(). One such scenario is when a netdev bridge is
running while dumping flows statistics with dpctl in parallel:
ovs-appctl dpctl/dump-flows. This commit makes usage of the first
safe option instead of the second option.

Fixes: 30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while offloading")
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoAdd offload packets statistics
zhaozhanxu [Thu, 5 Dec 2019 06:26:25 +0000 (14:26 +0800)]
Add offload packets statistics

Add argument '--offload-stats' for command ovs-appctl bridge/dump-flows
to display the offloaded packets statistics.

The commands display as below:

orignal command:

ovs-appctl bridge/dump-flows br0

duration=574s, n_packets=1152, n_bytes=110768, priority=0,actions=NORMAL
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop

new command with argument '--offload-stats'

Notice: 'n_offload_packets' are a subset of n_packets and 'n_offload_bytes' are
a subset of n_bytes.

ovs-appctl bridge/dump-flows --offload-stats br0

duration=582s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, n_offload_bytes=107992, priority=0,actions=NORMAL
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x3,actions=drop

Signed-off-by: zhaozhanxu <zhaozhanxu@163.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netdev-perf: Accurate cycle counter update
Malvika Gupta [Thu, 5 Dec 2019 17:04:20 +0000 (11:04 -0600)]
dpif-netdev-perf: Accurate cycle counter update

The accurate timing implementation in this patch gets the wall clock counter via
cntvct_el0 register access. This call is portable to all aarch64 architectures
and has been verified on an 64-bit arm server.

Suggested-by: Yanqin Wei <yanqin.wei@arm.com>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpdk: Update to use DPDK 19.11.
Ian Stokes [Tue, 3 Dec 2019 14:52:57 +0000 (14:52 +0000)]
dpdk: Update to use DPDK 19.11.

This commit adds support for DPDK v19.11, it includes the following
changes.

1. travis: Enable compilation and linkage with dpdk 19.11.

2. sparse: Remove dpdk network headers copies.

   https://patchwork.ozlabs.org/patch/1185256/

3. dpdk: Migrate to new PDUMP API.

   https://patchwork.ozlabs.org/patch/1192971/

4. netdev-dpdk: Prefix network structures with rte_.

   https://patchwork.ozlabs.org/patch/1109733/

5. netdev-dpdk: Update by new color definitions.

   https://patchwork.ozlabs.org/patch/1086089/

6. docs: Update docs to reference 19.11.

7. docs: Add note regarding hotplug and igb_uio requirements.

For credit all authors of the original commits to 'dpdk-latest' with the
above changes been added as co-authors for this commmit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Co-authored-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agotrivial: Fix erspan coding style.
William Tu [Tue, 3 Dec 2019 21:37:56 +0000 (13:37 -0800)]
trivial: Fix erspan coding style.

Fix indentation and whitespace.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Yi Yang.
William Tu [Tue, 3 Dec 2019 21:35:48 +0000 (13:35 -0800)]
AUTHORS: Add Yi Yang.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-vsctl: unit test for checking fail-mode related
Damijan Skvarc [Tue, 3 Dec 2019 08:33:49 +0000 (09:33 +0100)]
ovs-vsctl: unit test for checking fail-mode related

unit test is introduced which checks fail-mode related commands.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif-xlate: Prevent duplicating of traffic to a mirror port
Dmytro Linkin [Tue, 3 Dec 2019 14:11:21 +0000 (16:11 +0200)]
ofproto-dpif-xlate: Prevent duplicating of traffic to a mirror port

Currently ofproto design disallow duplicating output packet on forwarding
and mirroring to/from same ovs port. Next scenario reveal lack of design:
1. Send ping between regular ovs ports (VFs, for ex.), stop it.
2. While rule still exist, make mirror for one of the ports.
Prevent duplicating of traffic to a mirror port.

Fixes: 86e2dcddce85 ("dpif-xlate: Snoop multicast packets and send them properly")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoconntrack: Support zone limits.
Darrell Ball [Tue, 3 Dec 2019 17:14:17 +0000 (09:14 -0800)]
conntrack: Support zone limits.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif: Refactor the get capability code.
William Tu [Thu, 21 Nov 2019 19:09:02 +0000 (11:09 -0800)]
ofproto-dpif: Refactor the get capability code.

Make the code simpler by removing the use of
xasprintf and free, and use smap_add_format.

Cc: Ben Pfaff <blp@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev: use acquire-release semantics for change_seq in netdev
Yanqin Wei [Tue, 26 Nov 2019 07:35:23 +0000 (15:35 +0800)]
netdev: use acquire-release semantics for change_seq in netdev

"rxq_enabled" of netdev is writen in the vhost thread and read by pmd
thread once it observes 'change_seq' is updated. This patch is to keep
order on aarch64 or other weak memory model CPU to ensure 'rxq_enabled' is
observed before 'change_seq'.

Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath: make generic netlink group const
Greg Rose [Mon, 25 Nov 2019 22:20:44 +0000 (14:20 -0800)]
datapath: make generic netlink group const

Upstream commit:
    commit 48e48a70c08a8a68f8697f8b30cb83775bda8001
    Author: stephen hemminger <stephen@networkplumber.org>
    Date:   Wed Jul 16 11:25:52 2014 -0700

    openvswitch: make generic netlink group const

    Generic netlink tables can be const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The equivalent tables in meter.c and conntrack.c are constified so
it should be safe to do the same for these and will improve
security as well.

Original patch slightly modified for out of tree module.

Passes check-kmod.
Passes Travis.
https://travis-ci.org/gvrose8192/ovs-experimental/builds/616880002

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agofaq: Correct fragment reassembly release.
Darrell Ball [Tue, 26 Nov 2019 02:39:34 +0000 (18:39 -0800)]
faq: Correct fragment reassembly release.

Correct fragment reassembly release for the userspace datapath.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif-xlate: Restore table ID on error in xlate_table_action().
Ben Pfaff [Mon, 14 Oct 2019 22:34:21 +0000 (15:34 -0700)]
ofproto-dpif-xlate: Restore table ID on error in xlate_table_action().

Found by inspection.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodebian: Update list of copyright holders.
Ben Pfaff [Wed, 9 Oct 2019 17:33:44 +0000 (10:33 -0700)]
debian: Update list of copyright holders.

The list of copyright holders was incomplete and out of date.  This
updates it based on a "grep" for copyright notices, which I reviewed by
hand.

CC: 942056@bugs.debian.org
Reported-by: Chris Lamb <lamby@debian.org>
Reported-at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=942056
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoDocumentation: Convert multiple manpages to ReST.
Ben Pfaff [Thu, 10 Oct 2019 21:29:42 +0000 (14:29 -0700)]
Documentation: Convert multiple manpages to ReST.

Tested-by: Numan Siddique <numans@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agosparse: Get rid of obsolete rte_flow header.
David Marchand [Thu, 3 Oct 2019 18:11:24 +0000 (20:11 +0200)]
sparse: Get rid of obsolete rte_flow header.

This header had been copied to cope with issues on the dpdk side.
Now that the problems have been fixed [1], let's drop this file as it is
now out of sync with dpdk.

1: https://git.dpdk.org/dpdk/commit/?id=fbb25a3878cc

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoofproto: fix stack-buffer-overflow
Linhaifeng [Fri, 29 Nov 2019 06:13:35 +0000 (06:13 +0000)]
ofproto: fix stack-buffer-overflow

Should use flow->actions not &flow->actions.

here is ASAN report:
=================================================================
==57189==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xffff428fa0e8 at pc 0xffff7f61a520 bp 0xffff428f9420 sp 0xffff428f9498 READ of size 196 at 0xffff428fa0e8 thread T150 (revalidator22)
    #0 0xffff7f61a51f in __interceptor_memcpy (/lib64/libasan.so.4+0xa251f)
    #1 0xaaaad26a3b2b in ofpbuf_put lib/ofpbuf.c:426
    #2 0xaaaad26a30cb in ofpbuf_clone_data_with_headroom lib/ofpbuf.c:248
    #3 0xaaaad26a2e77 in ofpbuf_clone_with_headroom lib/ofpbuf.c:218
    #4 0xaaaad26a2dc3 in ofpbuf_clone lib/ofpbuf.c:208
    #5 0xaaaad23e3993 in ukey_set_actions ofproto/ofproto-dpif-upcall.c:1640
    #6 0xaaaad23e3f03 in ukey_create__ ofproto/ofproto-dpif-upcall.c:1696
    #7 0xaaaad23e553f in ukey_create_from_dpif_flow ofproto/ofproto-dpif-upcall.c:1806
    #8 0xaaaad23e65fb in ukey_acquire ofproto/ofproto-dpif-upcall.c:1984
    #9 0xaaaad23eb583 in revalidate ofproto/ofproto-dpif-upcall.c:2625
    #10 0xaaaad23dee5f in udpif_revalidator ofproto/ofproto-dpif-upcall.c:1076
    #11 0xaaaad26b84ef in ovsthread_wrapper lib/ovs-thread.c:708
    #12 0xffff7e74a8bb in start_thread (/lib64/libpthread.so.0+0x78bb)
    #13 0xffff7e0665cb in thread_start (/lib64/libc.so.6+0xd55cb)

Address 0xffff428fa0e8 is located in stack of thread T150 (revalidator22) at offset 328 in frame
    #0 0xaaaad23e4cab in ukey_create_from_dpif_flow ofproto/ofproto-dpif-upcall.c:1762

  This frame has 4 object(s):
    [32, 96) 'actions'
    [128, 192) 'buf'
    [224, 328) 'full_flow'
    [384, 2432) 'stub' <== Memory access at offset 328 partially underflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported) Thread T150 (revalidator22) created by T0 here:
    #0 0xffff7f5b0f7f in __interceptor_pthread_create (/lib64/libasan.so.4+0x38f7f)
    #1 0xaaaad26b891f in ovs_thread_create lib/ovs-thread.c:792
    #2 0xaaaad23dc62f in udpif_start_threads ofproto/ofproto-dpif-upcall.c:639
    #3 0xaaaad23daf87 in ofproto_set_flow_table ofproto/ofproto-dpif-upcall.c:446
    #4 0xaaaad230ff7f in dpdk_evs_cfg_set vswitchd/bridge.c:1134
    #5 0xaaaad2310097 in bridge_reconfigure vswitchd/bridge.c:1148
    #6 0xaaaad23279d7 in bridge_run vswitchd/bridge.c:3944
    #7 0xaaaad23365a3 in main vswitchd/ovs-vswitchd.c:240
    #8 0xffff7dfb1adf in __libc_start_main (/lib64/libc.so.6+0x20adf)
    #9 0xaaaad230a3d3  (/usr/sbin/ovs-vswitchd-2.7.0-1.1.RC5.001.asan+0x26f3d3)

SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.4+0xa251f) in __interceptor_memcpy Shadow bytes around the buggy address:
  0x200fe851f3c0: 00 00 00 00 f1 f1 f1 f1 f8 f2 f2 f2 00 00 00 00
  0x200fe851f3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fe851f3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fe851f3f0: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00
  0x200fe851f400: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2
=>0x200fe851f410: 00 00 00 00 00 00 00 00 00 00 00 00 00[f2]f2 f2
  0x200fe851f420: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fe851f430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fe851f440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fe851f450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x200fe851f460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==57189==ABORTING

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netdev: Use netdev-offload API for port lookup while offloading.
Ilya Maximets [Fri, 22 Nov 2019 15:09:14 +0000 (16:09 +0100)]
dpif-netdev: Use netdev-offload API for port lookup while offloading.

Currently, while offloading, userspace datapath tries to lookup netdev
in a local port list of the datapath interface instance.  However,
there is no guarantee that these netdevs are the same netdevs that
netdev-offload module operates with and, as a result, there is no any
guarantee that these netdev instances has initialized flow API.

dpif-netdev should request ports from the netdev-offload module as
intended by flow offloading API in a same way as dpif-netlink does.
This will also give us performance benefits because we don't need to
hold global port mutex anymore.

We're not noticing any significant issues with current code, but
it will become a serious issue in the future, e.g. with offloading
for virtual tunneling ports.

Reported-by: Ophir Munk <ophirmu@mellanox.com>
Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Eli Britstein <elibr@mellanox.com>
4 years agoofproto-provider: Move datapath capabilities callback to correct section.
Ilya Maximets [Tue, 26 Nov 2019 19:52:32 +0000 (20:52 +0100)]
ofproto-provider: Move datapath capabilities callback to correct section.

'get_datapath_cap' callback was mistakenly placed in
'Connection tracking' section of the 'struct dpif_class'
while belongs to the 'Datapath information'.

CC: William Tu <u9012063@gmail.com>
Fixes: 27501802d09f ("ofproto-dpif: Expose datapath capability to ovsdb.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
4 years agodp-packet: Fix clearing/copying of memory layout flags.
Ilya Maximets [Thu, 21 Nov 2019 13:14:52 +0000 (14:14 +0100)]
dp-packet: Fix clearing/copying of memory layout flags.

'ol_flags' of DPDK mbuf could contain bits responsible for external
or indirect buffers which are not actually offload flags in a common
sense.  Clearing/copying of these flags could lead to memory leaks of
external memory chunks and crashes due to access to wrong memory.

OVS should not clear these flags while resetting offloads and also
should not copy them to the newly allocated packets.

This change is required to support DPDK 19.11, as some drivers may
return mbufs with these flags set.  However, it might be good to do
the same for DPDK 18.11, because these flags are present and should
be taken into account.

Fixes: 03f3f9c0faf8 ("dpdk: Update to use DPDK 18.11.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
4 years agonetdev-dpdk: Deprecate ring ports.
Ilya Maximets [Tue, 26 Nov 2019 10:43:58 +0000 (11:43 +0100)]
netdev-dpdk: Deprecate ring ports.

'dpdkr' a.k.a. DPDK ring ports has really poor support in OVS and not
tested on a regular basis.  These ports are intended to work via
shared memory with another DPDK secondary process, but there are lots
of limitations for using this functionality in practice.  Most of them
connected with running secondary DPDK application and memory layout
issues.  More details are available in DPDK guide:
https://doc.dpdk.org/guides-18.11/prog_guide/multi_proc_support.html#multi-process-limitations

Beside the functional limitations it's also hard to use this
functionality correctly.  User must be sure that OVS and secondary DPDK
application are running on different CPU cores, which is hard because
non-PMD threads could float over available CPU cores.  This or any
other misconfiguration will likely lead to crash of OVS.

Another problem is that the user must actually build the secondary
application with the same version of DPDK that was used for OVS build.

Above issues are same as we have while using DPDK pdump.

Beside that, current implementation in OVS is not able to free
allocated rings that could lead to memory exhausting.

Initially these ports was added to use with IVSHMEM for a fast
zero-copy HOST<-->VM communication.  However, IVSHMEM is not used
anymore.  IVSHMEM support was removed from DPDK in 16.11 release
(instructions for IVSHMEM were removed from the OVS docs almost 3 years
ago by commit 90ca71dd317f ("doc: Remove ivshmem instructions.")) and
the patch for QEMU for using regular files as a device backend is no
longer available.  That makes DPDK ring ports barely useful in real
virtualization environment.

This patch adds a deprecation warnings for run-time port creation
and documentation.  Claiming to completely remove this functionality
from OVS in one of the next releases.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
4 years agodpdk: Use DPDK 18.11.5 release.
Ian Stokes [Tue, 26 Nov 2019 12:03:04 +0000 (12:03 +0000)]
dpdk: Use DPDK 18.11.5 release.

Modify travis linux build script to use the latest DPDK stable release
18.11.5. Update docs for latest DPDK stable releases.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
4 years agoofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.
Ilya Maximets [Fri, 1 Nov 2019 21:24:39 +0000 (22:24 +0100)]
ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.

Handling of OpenFlow PACKET_OUT implies pushing the packet through
the datapath and packet processing inside the datapath could trigger
an upcall.  In case of system datapath, 'dpif_execute()' sends packet
to the kernel module and returns.  If any, upcall  will be triggered
inside the kernel and handled by a separate thread in userspace.
But in case of userspace datapath full processing of the packet happens
inside the 'dpif_execute()' in the same thread that handled PACKET_OUT.
This causes an issue if upcall will lead to modification of flow rules.
For example, it could happen while processing of 'learn' actions.
Since whole handling of PACKET_OUT is protected by 'ofproto_mutex',
OVS will assert on attempt to take it recursively while processing
'learn' actions:

   0 __GI_raise (sig=sig@entry=6)
   1 __GI_abort ()
   2 ovs_abort_valist ()
   3 ovs_abort ()
   4 ovs_mutex_lock_at (where=where@entry=0xad4199 "ofproto/ofproto.c:5391")
                <Trying to acquire ofproto_mutex again>
   5 ofproto_flow_mod_learn ()       at ofproto/ofproto.c:5391
                <Trying to modify flows according to 'learn' action>
   6 xlate_learn_action ()           at ofproto/ofproto-dpif-xlate.c:5378
                <'learn' action found>
   7 do_xlate_actions ()             at ofproto/ofproto-dpif-xlate.c:6893
   8 xlate_recursively ()            at ofproto/ofproto-dpif-xlate.c:4233
   9 xlate_table_action ()           at ofproto/ofproto-dpif-xlate.c:4361
  10 in xlate_ofpact_resubmit ()     at ofproto/ofproto-dpif-xlate.c:4672
  11 do_xlate_actions ()             at ofproto/ofproto-dpif-xlate.c:6773
  12 xlate_actions ()                at ofproto/ofproto-dpif-xlate.c:7570
                 <Translating actions>
  13 upcall_xlate ()                 at ofproto/ofproto-dpif-upcall.c:1197
  14 process_upcall ()               at ofproto/ofproto-dpif-upcall.c:1413
  15 upcall_cb ()                    at ofproto/ofproto-dpif-upcall.c:1315
  16 dp_netdev_upcall (DPIF_UC_MISS) at lib/dpif-netdev.c:6236
                 <Flow cache miss. Making upcall>
  17 handle_packet_upcall ()         at lib/dpif-netdev.c:6591
  18 fast_path_processing ()         at lib/dpif-netdev.c:6709
  19 dp_netdev_input__ ()            at lib/dpif-netdev.c:6797
  20 dp_netdev_recirculate ()        at lib/dpif-netdev.c:6842
  21 dp_execute_cb ()                at lib/dpif-netdev.c:7158
  22 odp_execute_actions ()          at lib/odp-execute.c:794
  23 dp_netdev_execute_actions ()    at lib/dpif-netdev.c:7332
  24 dpif_netdev_execute ()          at lib/dpif-netdev.c:3725
  25 dpif_netdev_operate ()          at lib/dpif-netdev.c:3756
                 <Packet pushed to userspace datapath for processing>
  26 dpif_operate ()                 at lib/dpif.c:1367
  27 dpif_execute ()                 at lib/dpif.c:1321
  28 packet_execute ()               at ofproto/ofproto-dpif.c:4760
  29 ofproto_packet_out_finish ()    at ofproto/ofproto.c:3594
                 <Taking ofproto_mutex>
  30 handle_packet_out ()            at ofproto/ofproto.c:3635
  31 handle_single_part_openflow (OFPTYPE_PACKET_OUT) at ofproto/ofproto.c:8400
  32 handle_openflow ()                               at ofproto/ofproto.c:8587
  33 ofconn_run ()
  34 connmgr_run ()
  35 ofproto_run ()
  36 bridge_run__ ()
  37 bridge_run ()
  38 main ()

Fix that by splitting the 'ofproto_packet_out_finish()' in two parts.
First one that translates side-effects and requires holding 'ofproto_mutex'
and the second that only pushes packet to datapath.  The second part moved
out of 'ofproto_mutex' because 'ofproto_mutex' is not required and actually
should not be taken in order to avoid recursive locking.

Reported-by: Anil Kumar Koli <anilkumar.k@altencalsoftlabs.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agovswitch.xml: Replace non-ASCII characters.
Ilya Maximets [Mon, 25 Nov 2019 10:07:42 +0000 (11:07 +0100)]
vswitch.xml: Replace non-ASCII characters.

This fixes OSX build on Travis:

ovs-vswitchd.conf.db.5:4061: warning: invalid input character code 128
ovs-vswitchd.conf.db.5:4061: warning: invalid input character code 156

Fixes: aa453e319961 ("ofproto-dpif: Expose datapath ND Extensions capability to ovsdb")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
4 years agoofproto-dpif: Expose datapath ND Extensions capability to ovsdb
Flavio Leitner [Fri, 22 Nov 2019 18:09:02 +0000 (15:09 -0300)]
ofproto-dpif: Expose datapath ND Extensions capability to ovsdb

Document and expose datapath ND Extensions capability to ovsdb.

Fixes: d0d571493 ("ofproto-dpif: Allow IPv6 ND Extensions only if supported")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif-upcall: Echo HASH attribute back to datapath.
Tonghao Zhang [Fri, 15 Nov 2019 02:58:59 +0000 (10:58 +0800)]
ofproto-dpif-upcall: Echo HASH attribute back to datapath.

The kernel datapath may sent upcall with hash info,
ovs-vswitchd should get it from upcall and then send
it back.

The reason is that:
| When using the kernel datapath, the upcall don't
| include skb hash info relatived. That will introduce
| some problem, because the hash of skb is important
| in kernel stack. For example, VXLAN module uses
| it to select UDP src port. The tx queue selection
| may also use the hash in stack.
|
| Hash is computed in different ways. Hash is random
| for a TCP socket, and hash may be computed in hardware,
| or software stack. Recalculation hash is not easy.
|
| There will be one upcall, without information of skb
| hash, to ovs-vswitchd, for the first packet of a TCP
| session. The rest packets will be processed in Open vSwitch
| modules, hash kept. If this tcp session is forward to
| VXLAN module, then the UDP src port of first tcp packet
| is different from rest packets.
|
| TCP packets may come from the host or dockers, to Open vSwitch.
| To fix it, we store the hash info to upcall, and restore hash
| when packets sent back.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=bd1903b7c4596ba6f7677d0dfefd05ba5876707d
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoDatapath: Change in openvswitch kernel module to support MPLS label depth of 3 in...
Martin Varghese [Fri, 22 Nov 2019 06:07:46 +0000 (11:37 +0530)]
Datapath: Change in openvswitch kernel module to support MPLS label depth of 3 in ingress direction.

Upstream commit:
    commit fbdcdd78da7c95f1b970d371e1b23cbd3aa990f3
    Author: Martin Varghese <martin.varghese@nokia.com>
    Date:   Mon Nov 4 07:27:44 2019 +0530

    Change in Openvswitch to support MPLS label depth of 3 in ingress
    direction

    The openvswitch was supporting a MPLS label depth of 1 in the
    ingress direction though the userspace OVS supports a max depth
    of 3 labels.This change enables openvswitch module to support a
    max depth of 3 labels in the ingress.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoflow: Fix IPv6 header parser with partial offloading.
Zhike Wang [Fri, 8 Nov 2019 09:02:44 +0000 (17:02 +0800)]
flow: Fix IPv6 header parser with partial offloading.

Set nw_proto before it is used in parse_ipv6_ext_hdrs__().

Signed-off-by: Zhike Wang <wangzk320@163.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agolacp: Add support to recognize LACP Marker RX PDUs.
Nitin katiyar [Tue, 12 Nov 2019 08:08:59 +0000 (09:08 +0100)]
lacp: Add support to recognize LACP Marker RX PDUs.

OVS does not support the LACP Marker protocol. Typically, ToR switches
send a LACP Marker PDU when restarting LACP negotiation following a link
flap or LACP timeout.

When a LACP Marker PDU is received, OVS logs the following warning and
drops the packet:
    “lacp(pmdXXX)|WARN|bond-prv: received an unparsable LACP PDU.”

As the above message is logged around the same time the link flap or
LACP down events are logged, it gives a misleading impression that the
reception of an unparsable LACP PDU is the reason for the LACP down
event.

The proposed patch does not add support for the LACP Marker protocol.
It simply recognizes LACP Marker packets, drops them and logs a clear
message indicating that a Marker packet was a received. A counter to
track the number of such packets received is also added.

Signed-off-by: Nitin katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoofproto-dpif: Allow IPv6 ND Extensions only if supported
Flavio Leitner [Wed, 20 Nov 2019 14:21:13 +0000 (11:21 -0300)]
ofproto-dpif: Allow IPv6 ND Extensions only if supported

The IPv6 ND Extensions is only implemented in userspace datapath,
but nothing prevents that to be used with other datapaths.

This patch probes the datapath and only allows if the support
is available.

Fixes: 9b2b84973 ("Support for match & set ICMPv6 reserved and options type fields")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoAUTHORS: Add Wang Li.
Ben Pfaff [Fri, 22 Nov 2019 00:48:45 +0000 (16:48 -0800)]
AUTHORS: Add Wang Li.

Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoipf: bail out when ipf state is COMPLETED
Li RongQing [Thu, 14 Nov 2019 09:18:18 +0000 (17:18 +0800)]
ipf: bail out when ipf state is COMPLETED

it is easy to crash ovs when a packet with same id
hits a list that already reassembled completedly
but have not been sent out yet, and this packet is
not duplicate with this hit ipf list due to bigger
offset

    1  0x00007f9fef0ae2d9 in __GI_abort () at abort.c:89
    2  0x0000000000464042 in ipf_list_state_transition at lib/ipf.c:545

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Co-authored-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb raft: Fix election timer parsing in snapshot RPC.
Han Zhou [Wed, 13 Nov 2019 17:33:59 +0000 (09:33 -0800)]
ovsdb raft: Fix election timer parsing in snapshot RPC.

Commit a76ba825 took care of saving and restoring election timer in
file header snapshot, but it didn't handle the parsing of election
timer in install_snapshot_request/reply RPC, which results in problems,
e.g. when election timer change log is compacted in snapshot and then a
new node join the cluster, the new node will use the default timer
instead of the new value.  This patch fixed it by parsing election
timer in snapshot RPC.

At the same time the patch updates the test case to cover the DB compact and
join senario. The test reveals another 2 problems related to clustered DB
compact, as commented in the test case's XXX, which need to be addressed
separately.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agojsonrpc: increase input buffer size from 512 to 4096
Lorenzo Bianconi [Wed, 6 Nov 2019 09:19:44 +0000 (11:19 +0200)]
jsonrpc: increase input buffer size from 512 to 4096

Increase jsonrpc input buffer size from 512 to 4096 bytes in order to
reduce the syscall overhead when downloading huge db size

Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Enable libbpf logging to OVS.
William Tu [Wed, 20 Nov 2019 20:25:56 +0000 (12:25 -0800)]
netdev-afxdp: Enable libbpf logging to OVS.

libbpf has pr_warn, pr_info, and pr_debug. The patch registers
these print functions, integrating the libbpf logs to OVS log.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
4 years agoofproto-dpif: Expose datapath capability to ovsdb.
William Tu [Fri, 4 Oct 2019 20:48:58 +0000 (13:48 -0700)]
ofproto-dpif: Expose datapath capability to ovsdb.

The patch adds support for fetching the datapath's capabilities
from the result of 'check_support()', and write the supported capability
to a new database column, called 'capabilities' under Datapath table.

To see how it works, run:
 # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
 # ovs-vsctl -- --id=@m create Datapath datapath_version=0 \
     'ct_zones={}' 'capabilities={}' \
     -- set Open_vSwitch . datapaths:"netdev"=@m

 # ovs-vsctl list-dp-cap netdev
 ufid=true sample_nesting=true clone=true tnl_push_pop=true \
 ct_orig_tuple=true ct_eventmask=true ct_state=true \
 ct_clear=true max_vlan_headers=1 recirc=true ct_label=true \
 max_hash_alg=1 ct_state_nat=true ct_timeout=true \
 ct_mark=true ct_orig_tuple6=true check_pkt_len=true \
 masked_set_action=true max_mpls_depth=3 trunc=true ct_zone=true

Signed-off-by: William Tu <u9012063@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
---
v5:
    Add improved documentation from Ben and
    fix checkpatch error (tab and line 79 char)
v4:
    rebase to master
v3:
    fix 32-bit build, reported by Greg
    travis: https://travis-ci.org/williamtu/ovs-travis/builds/599276267
v2:
rebase to master

4 years agoovsdb-server: fix memory leak while deleting zone
Damijan Skvarc [Tue, 12 Nov 2019 11:32:35 +0000 (12:32 +0100)]
ovsdb-server: fix memory leak while deleting zone

memory leak was detected by valgrind during execution
of "database commands -- positive checks" test.

leaked memory was allocated in ovsdb_execute_mutate() function
while parsing mutations from the apparent json entity:

==19563==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19563==    by 0x4652D0: xmalloc (util.c:138)
==19563==    by 0x46539E: xmemdup0 (util.c:168)
==19563==    by 0x4653F7: xstrdup (util.c:177)
==19563==    by 0x450379: ovsdb_base_type_clone (ovsdb-types.c:208)
==19563==    by 0x450F8D: ovsdb_type_clone (ovsdb-types.c:550)
==19563==    by 0x428C3F: ovsdb_mutation_from_json (mutation.c:108)
==19563==    by 0x428F6B: ovsdb_mutation_set_from_json (mutation.c:187)
==19563==    by 0x42578D: ovsdb_execute_mutate (execution.c:573)
==19563==    by 0x4246B0: ovsdb_execute_compose (execution.c:171)
==19563==    by 0x41CDE5: ovsdb_trigger_try (trigger.c:204)
==19563==    by 0x41C8DF: ovsdb_trigger_init (trigger.c:61)
==19563==    by 0x40E93C: ovsdb_jsonrpc_trigger_create (jsonrpc-server.c:1135)
==19563==    by 0x40E20C: ovsdb_jsonrpc_session_got_request (jsonrpc-server.c:1002)
==19563==    by 0x40D1C2: ovsdb_jsonrpc_session_run (jsonrpc-server.c:561)
==19563==    by 0x40D31E: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:591)
==19563==    by 0x40CD6E: ovsdb_jsonrpc_server_run (jsonrpc-server.c:406)
==19563==    by 0x40627E: main_loop (ovsdb-server.c:209)
==19563==    by 0x406E66: main (ovsdb-server.c:460)

This memory is usually freed at the end of ovsdb_execute_mutate()
however in the aforementioned test case this does not happen. Namely
in case of delete mutator and in case of error while calling ovsdb_datum_from_json()
apparent mutation was marked as invalid, what prevents freeing problematic memory.

Memory leak can be reproduced quickly with the following command sequence:
ovs-vsctl --no-wait -vreconnect:emer add-zone-tp netdev zone=1 icmp_first=1 icmp_reply=2
ovs-vsctl --no-wait -vreconnect:emer del-zone-tp netdev zone=1

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agocompat: Add missing inline keyword
Greg Rose [Tue, 5 Nov 2019 22:14:24 +0000 (14:14 -0800)]
compat: Add missing inline keyword

The missing inline keyword before the definition of the
rpl_nf_ct_tmpl_free() function causes spurious warnings about the
function not being used on some older kernels.  Add the keyword
to suppress the warning.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovs-actions: Clarify documentation for stack usage with group buckets.
Ben Pfaff [Wed, 20 Nov 2019 19:53:46 +0000 (11:53 -0800)]
ovs-actions: Clarify documentation for stack usage with group buckets.

This should be less confusing now.

Reported-by: Han Zhou <hzhou@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agonetdev-afxdp: Best-effort configuration of XDP mode.
Ilya Maximets [Wed, 6 Nov 2019 21:38:33 +0000 (21:38 +0000)]
netdev-afxdp: Best-effort configuration of XDP mode.

Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode.  This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.

This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface.  This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native               - new one, DRV without zero-copy
  * generic              - former SKB
  * best-effort          - new one, chooses the best available from
                           3 above modes

Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>