]> git.proxmox.com Git - ovs.git/log
ovs.git
5 years agoflow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.
Vishal Deep Ajmera [Wed, 10 Jul 2019 13:32:30 +0000 (19:02 +0530)]
flow: Wildcard UDP ports when using SYMMETRIC_L4 hash for select groups.

UDP source and destination ports are not used to derive the hash index
used for selecting the bucket in case of SYMMETRIC_L4 hash based select
groups. However, they are un-wildcarded in the megaflow entry match criteria.
This results in distinct megaflow entry being created for each pair of UDP
source and destination ports unnecessarily and causes significant performance
deterioration when the megaflow cache limit is reached.

This patch wildcards UDP ports when using select group with SYMMETRIC_L4
hash function.

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add ovn-northd IGMP support
Dumitru Ceara [Mon, 15 Jul 2019 20:25:03 +0000 (22:25 +0200)]
OVN: Add ovn-northd IGMP support

New IP Multicast Snooping Options are added to the Northbound DB
Logical_Switch:other_config column. These allow enabling IGMP snooping and
querier on the logical switch and get translated by ovn-northd to rows in
the IP_Multicast Southbound DB table.

ovn-northd monitors for changes done by ovn-controllers in the Southbound DB
IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates
Multicast_Group entries in the Southbound DB, one per IGMP_Group address X,
containing the list of logical switch ports (aggregated from all controllers)
that have IGMP_Group entries for that datapath and address X. ovn-northd
also creates a logical flow that matches on IP multicast traffic destined
to address X and outputs it on the tunnel key of the corresponding
Multicast_Group entry.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Add IGMP SB definitions and ovn-controller support
Dumitru Ceara [Mon, 15 Jul 2019 20:24:49 +0000 (22:24 +0200)]
OVN: Add IGMP SB definitions and ovn-controller support

A new IP_Multicast table is added to Southbound DB. This table stores the
multicast related configuration for each datapath. Each row will be
populated by ovn-northd and will control:
- if IGMP Snooping is enabled or not, the snooping table size and multicast
  group idle timeout.
- if IGMP Querier is enabled or not (only if snooping is enabled too), query
  interval, query source addresses (Ethernet and IP) and the max-response
  field to be stored in outgoing queries.
- an additional "seq_no" column is added such that ovn-sbctl or if needed a
  CMS can flush currently learned groups. This can be achieved by incrementing
  the "seq_no" value.

A new IGMP_Group table is added to Southbound DB. This table stores all the
multicast groups learned by ovn-controllers. The table is indexed by
datapath, group address and chassis. For a learned multicast group on a
specific datapath each ovn-controller will store its own row in this table.
Each row contains the list of chassis-local ports on which the group was
learned. Rows in the IGMP_Group table are updated or deleted only by the
ovn-controllers that created them.

A new action ("igmp") is added to punt IGMP packets on a specific logical
switch datapath to ovn-controller if IGMP snooping is enabled.

Per datapath IGMP multicast snooping support is added to pinctrl:
- incoming IGMP reports are processed and multicast groups are maintained
  (using the OVS mcast-snooping library).
- each OVN controller syncs its in-memory IGMP groups to the Southbound DB
  in the IGMP_Group table.
- pinctrl also sends periodic IGMPv3 general queries for all datapaths where
  querier is enabled.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agopackets: Add IGMPv3 query packet definitions
Dumitru Ceara [Mon, 15 Jul 2019 20:24:37 +0000 (22:24 +0200)]
packets: Add IGMPv3 query packet definitions

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-northd: Fix the ovn-northd continuous looping
Numan Siddique [Tue, 16 Jul 2019 18:01:10 +0000 (23:31 +0530)]
ovn-northd: Fix the ovn-northd continuous looping

ovn-northd wakes up continuously from poll_block(). This issue can be reproduced
in the sandbox with the below commands

ovn-nbctl lr-add lr0
ovn-nbctl ls-add public
ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24
ovn-nbctl lsp-add public public-lr0
ovn-nbctl lsp-set-type public-lr0 router
ovn-nbctl lsp-set-addresses public-lr0 router
ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
ovn-nbctl lrp-set-gateway-chassis lr0-public chassis-1 20

This issue is seen after the commit [1], which makes use of the function -
sbrec_port_binding_update_nat_addresses_addvalue() to add a value to
Port_Binding.nat_addresses column.

Looks like the IDL client code is sending the transactions to the ovsdb-server repeatedly
to update the Port_Binding.nat_addresses even though the Southbound DB has updated
the column when this function is used. The actual bug seems to be in the IDL client code
and that needs to be fixed. This patch as a quick fix, fixes ovn-northd's continuous loop
by not using this function, instead making use of sbrec_port_binding_set_nat_addresses().

The below messages are seen continuously when the ovn-nortdh debug logs are enabled.

****

2019-07-12T17:26:13.837Z|74512|jsonrpc|DBG|unix:sb1.ovsdb: received reply,
result=[{},{"count":1},{"count":1}], id=18628
2019-07-12T17:26:13.837Z|74513|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
2019-07-12T17:26:13.837Z|74514|jsonrpc|DBG|unix:sb1.ovsdb: send request,
method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
{"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"row":
{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},{"mutations":[["nat_addresses",
"insert",["set",["00:00:20:20:12:13 172.168.0.100 is_chassis_resident(\"cr-lr0-public\")"]]]],
"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18629

2019-07-12T17:26:13.837Z|74516|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18629
2019-07-12T17:26:13.837Z|74517|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
2019-07-12T17:26:13.837Z|74518|jsonrpc|DBG|unix:sb1.ovsdb: send request,
method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
{"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],
"row":{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},
{"mutations":[["nat_addresses","insert",["set",["00:00:20:20:12:13 172.168.0.100
is_chassis_resident(\"cr-lr0-public\")"]]]],"where":[["_uuid","==",["uuid",
"56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18630
2019-07-12T17:26:13.837Z|74520|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18630
******

The OpenStack CI tests for networking-ovn is frequently failing few tests after this
commit. The failure seems to be related to timing issues as ovn-northd is hogging
the CPU continuously. We are also seeing travis CI test failures after this commit.

[1] - ed198fb3b92e

Fixes: ed198fb3b92e("ovn: Send GARP for the router ports with reside-on-redirect-chassis options set")
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-macros: An option to suspend test execution on error
Vasu Dasari [Mon, 15 Jul 2019 21:15:01 +0000 (17:15 -0400)]
ovs-macros: An option to suspend test execution on error

Origins for this patch are captured at
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048923.html.

Summarizing here, when a test fails, it would be good to pause test execution
and let the developer poke around the system to see current status of system.

As part of this patch, made a small tweaks to ovs-macros.at, so that when test
suite fails, ovs_on_exit() function will be called. And in this function, a check
is made to see if an environment variable to OVS_PAUSE_TEST is set. If it is
set, then test suite is paused and will continue to wait for user input
Ctrl-D. Meanwhile user can poke around the system to see why test case has
failed. Once done with investigation, user can press ctrl-d to cleanup the
test suite.

For example, to re-run test case 139:

export OVS_PAUSE_TEST=1
cd tests/system-userspace-testsuite.dir/139
sudo -E ./run

When error occurs, above command would display something like this:
=====================================================
Set environment variable to use various ovs utilities
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
Press ENTER to continue:

=====================================================
And from another window, one can execute ovs-xxx commands like:
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
$ ovs-ofctl dump-ports br0
.
.

To be able to pause while performing `make check`, one can do:
$ OVS_PAUSE_TEST=1 make check TESTSUITEFLAGS='-v'

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: use trigger_event action to report 'empty_lb_rule' events
Lorenzo Bianconi [Thu, 11 Jul 2019 15:48:45 +0000 (17:48 +0200)]
OVN: use trigger_event action to report 'empty_lb_rule' events

Add northd logical flows in order to reports that the controller
received an IP packet for LB rule witn no backends.
This configuration is used by OpenShift to spin up a idle POD

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: introduce trigger_event() action
Lorenzo Bianconi [Thu, 11 Jul 2019 15:48:44 +0000 (17:48 +0200)]
OVN: introduce trigger_event() action

Add trigger_event() ovn action in order to allow ovs-vswitchd to report
CMS related events.
This commit introduces a new event, empty_lb_backends. This event is
raised if a received packet is destined for a load balancer VIP that has
no configured backend destinations. For this event, the event info
includes the load balancer VIP, the load balancer UUID, and the
transport protocol.
The use case for this particular event is for the CMS to supply backend
resources to handle this traffic. For example, in Openshift, this event
can be used to spin up new containers to handle the incoming traffic.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: introduce Controller_Event table
Lorenzo Bianconi [Thu, 11 Jul 2019 15:48:43 +0000 (17:48 +0200)]
OVN: introduce Controller_Event table

Add Controller_Event table to OVN SBDB in order to
report CMS related event.
Introduce event_table hashmap array and controller_event related
structures to ovn-controller in order to track pending events
forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
array with event_table ovn-sbdb table

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Co-authored-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoShutdown SSL connection before closing socket
Terry Wilson [Thu, 11 Jul 2019 13:00:20 +0000 (08:00 -0500)]
Shutdown SSL connection before closing socket

Without shutting down the SSL connection, log messages like:

stream_ssl|WARN|SSL_read: unexpected SSL connection close
jsonrpc|WARN|ssl:127.0.0.1:47052: receive error: Protocol error
reconnect|WARN|ssl:127.0.0.1:47052: connection dropped (Protocol error)

would occur whenever the socket is closed. This just adds an
SSLStream.close() that calls shutdown() and ignores SSL errors, the
same way that lib/stream-ssl.c does in ssl_close().

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotests: Add ovsdb-cluster-testsuite to gitignore.
Ilya Maximets [Fri, 12 Jul 2019 16:15:27 +0000 (19:15 +0300)]
tests: Add ovsdb-cluster-testsuite to gitignore.

CC: Han Zhou <hzhou8@ebay.com>
Fixes: 2bcb3b7052c8 ("ovsdb raft: Move ovsdb cluster tests to separate testsuite.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Han Zhou <hzhou8@ebay.com>
5 years agocheckpatch: Check FOR_EACH loops with numbers.
Ilya Maximets [Fri, 12 Jul 2019 12:57:02 +0000 (15:57 +0300)]
checkpatch: Check FOR_EACH loops with numbers.

OVS has defines for loops like 'BITMAP_FOR_EACH_1' or
'ULLONG_FOR_EACH_1', but the regexp in checkpatch doesn't match with
numbers and skips these loops while checking.

This patch adds numbers into regexp and adds some FER_EACH loops to
the unit tests.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agoovn: Fix the test failures in travis CI.
Numan Siddique [Thu, 11 Jul 2019 16:00:33 +0000 (21:30 +0530)]
ovn: Fix the test failures in travis CI.

After the commit [1], below test cases are failing repeatedly in travis CI.

2663: ovn -- 4 HV, 1 LS, 1 LR, packet test with HA distributed router gateway port FAILED (ovn.at:8597)
2664: ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port FAILED (ovn.at:8844)
2667: ovn -- vlan traffic for external network with distributed router gateway port FAILED (ovn.at:9580)
2691: ovn -- router - check packet length - icmp defrag FAILED (ovn.at:13624)

With the commit [1], ovn-controller sends GARPs for the IPs of the distributed
router ports. The failing tests did not handle the situation if multiple GARPs
are sent. The failures are mostly timing related. This patch fixes these issues.

[1] - d65586b6fa97 ("ovn: Send GARP for router port IPs of a router port connected to bridged logical switch")

Fixes: d65586b6fa97 ("ovn: Send GARP for router port IPs of a router port connected to bridged logical switch")
CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-vport: Make ip6gre netdev type to use TC rules
Eli Britstein [Thu, 4 Jul 2019 07:36:42 +0000 (07:36 +0000)]
netdev-vport: Make ip6gre netdev type to use TC rules

The offload api functions already assigned to every tunnel class.
For ip6gre tunnel class only need to also assign the get_ifindex
function, similarly as done in commit 5e63eaa969a3 ("netdev-vport: Make
gre netdev type to use TC rules").

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-performance.at: Fix syntax error in ACL.
Han Zhou [Wed, 10 Jul 2019 04:23:11 +0000 (21:23 -0700)]
ovn-performance.at: Fix syntax error in ACL.

This doesn't impact the effectiveness of the test but just fix an
obvious error in ACL syntax which was noticed when looking at test
logs.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-performance.at: Missing steps for connecting LS to LR.
Han Zhou [Wed, 10 Jul 2019 04:23:10 +0000 (21:23 -0700)]
ovn-performance.at: Missing steps for connecting LS to LR.

The test creates 2 logical switches and connect them with a logical router.
However, it didn't set the option "router-port", so the 2 LS datapaths
were not connected. This results in missing test coverage for port-binding
incremental processing: assume I-P has a bug and port-binding change always
trigger recompute, since each HV monitors only its own datapath (i.e.
HV1 -> ls1, HV2 -> ls2) then it never got notification of the other
port-binding change, thus recompute is never triggered when port-binding
is updated on the other datapath. With this fix, each HV's local datapaths
will include both ls1 and ls2, so port-binding change notification will
be received properly and unexpected recompute would be captured.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-server: drop all connections on read/write status change
Daniel Alvarez [Tue, 9 Jul 2019 10:16:30 +0000 (12:16 +0200)]
ovsdb-server: drop all connections on read/write status change

Prior to this patch, only db change aware connections were dropped
on a read/write status change. However, current schema in OVN does
not allow clients to monitor whether a particular DB changes this
status. In order to accomplish this, we'd need to change the schema
and adapting ovsdb-server and existing clients.

Before tackling that, this patch is changing ovsdb-server to drop
*all* the existing connections upon a read/write status change. This
will force clients to reconnect and honor the change.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-July/048981.html
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: fix csum updates for MPLS actions
Greg Rose [Tue, 9 Jul 2019 15:25:03 +0000 (08:25 -0700)]
datapath: fix csum updates for MPLS actions

Upstream commit:
    commit 0e3183cd2a64843a95b62f8bd4a83605a4cf0615
    Author: John Hurley <john.hurley@netronome.com>
    Date:   Thu Jun 27 14:37:30 2019 +0100

    net: openvswitch: fix csum updates for MPLS actions

    Skbs may have their checksum value populated by HW. If this is a checksum
    calculated over the entire packet then the CHECKSUM_COMPLETE field is
    marked. Changes to the data pointer on the skb throughout the network
    stack still try to maintain this complete csum value if it is required
    through functions such as skb_postpush_rcsum.

    The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when
    changes are made to packet data without a push or a pull. This occurs when
    the ethertype of the MAC header is changed or when MPLS lse fields are
    modified.

    The modification is carried out using the csum_partial function to get the
    csum of a buffer and add it into the larger checksum. The buffer is an
    inversion of the data to be removed followed by the new data. Because the
    csum is calculated over 16 bits and these values align with 16 bits, the
    effect is the removal of the old value from the CHECKSUM_COMPLETE and
    addition of the new value.

    However, the csum fed into the function and the outcome of the
    calculation are also inverted. This would only make sense if it was the
    new value rather than the old that was inverted in the input buffer.

    Fix the issue by removing the bit inverts in the csum_partial calculation.

    The bug was verified and the fix tested by comparing the folded value of
    the updated CHECKSUM_COMPLETE value with the folded value of a full
    software checksum calculation (reset skb->csum to 0 and run
    skb_checksum_complete(skb)). Prior to the fix the outcomes differed but
    after they produce the same result.

Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel")
Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: ccf4378615e9 ("datapath: Add basic MPLS support to kernel")
Fixes: b51367aad315 ("datapath: update checksum in {push,pop}_mpls")
Cc: John Hurley <john.hurley@netronome.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: ip6_gre: fix possible use-after-free in ip6erspan_rcv
Greg Rose [Tue, 9 Jul 2019 15:25:02 +0000 (08:25 -0700)]
compat: ip6_gre: fix possible use-after-free in ip6erspan_rcv

Upstream commit:
    commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb
    Author: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
    Date:   Sat Apr 6 17:16:53 2019 +0200

    net: ip6_gre: fix possible use-after-free in ip6erspan_rcv

    erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove
    erspan header. This can determine a possible use-after-free accessing
    pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned'
    running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
    been sent though a veth device). Fix it resetting pkt_md pointer after
    __iptunnel_pull_header

Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: c387d8177f20 ("compat: Add ipv6 GRE and IPV6 Tunneling")
Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodpif-netdev: Clarify PMD reloading scheme.
Ilya Maximets [Wed, 10 Jul 2019 11:50:52 +0000 (14:50 +0300)]
dpif-netdev: Clarify PMD reloading scheme.

It became more complicated, hence needs to be documented.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Catch reloads faster.
David Marchand [Tue, 9 Jul 2019 16:19:58 +0000 (18:19 +0200)]
dpif-netdev: Catch reloads faster.

Looking at the reload flag only every 1024 loops can be a long time
under load, since we might be handling 32 packets per rxq, per iteration,
which means up to poll_cnt * 32 * 1024 packets.
Look at the flag every loop, no major performance impact seen.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Only reload static tx qid when needed.
David Marchand [Tue, 9 Jul 2019 16:19:57 +0000 (18:19 +0200)]
dpif-netdev: Only reload static tx qid when needed.

pmd->static_tx_qid is allocated under a mutex by the different pmd
threads.
Unconditionally reallocating it will make those pmd threads sleep
when contention occurs.
During "normal" reloads like for rebalancing queues between pmd threads,
this can make pmd threads waste time on this.
Reallocating the tx qid is only needed when removing other pmd threads
as it is the only situation when the qid pool can become uncontiguous.

Add a flag to instruct the pmd to reload tx qid for this case which is
Step 1 in current code.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Do not sleep when swapping queues.
David Marchand [Tue, 9 Jul 2019 16:19:56 +0000 (18:19 +0200)]
dpif-netdev: Do not sleep when swapping queues.

When swapping queues from a pmd thread to another (q0 polled by pmd0/q1
polled by pmd1 -> q1 polled by pmd0/q0 polled by pmd1), the current
"Step 5" puts both pmds to sleep waiting for the control thread to wake
them up later.

Prefer to make them spin in such a case to avoid sleeping an
undeterministic amount of time.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Trigger parallel pmd reloads.
David Marchand [Tue, 9 Jul 2019 16:19:55 +0000 (18:19 +0200)]
dpif-netdev: Trigger parallel pmd reloads.

pmd reloads are currently serialised in each steps calling
reload_affected_pmds.
Any pmd processing packets, waiting on a mutex etc... will make other
pmd threads wait for a delay that can be undeterministic when syscalls
adds up.

Switch to a little busy loop on the control thread using the existing
per-pmd reload boolean.

The memory order on this atomic is rel-acq to have an explicit
synchronisation between the pmd threads and the control thread.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Convert exit latch to flag.
David Marchand [Tue, 9 Jul 2019 16:19:54 +0000 (18:19 +0200)]
dpif-netdev: Convert exit latch to flag.

No need for a latch here since we don't have to wait.
A simple boolean flag is enough.

The memory order on the reload flag is changed to rel-acq ordering to
serve as a synchronisation point between the pmd threads and the control
thread that asks for termination.

Fixes: e4cfed38b159 ("dpif-netdev: Add poll-mode-device thread.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodist-docs: Fix bugs in text to HTML conversion.
Ben Pfaff [Fri, 10 May 2019 22:02:43 +0000 (15:02 -0700)]
dist-docs: Fix bugs in text to HTML conversion.

This fixes two bugs.  First, & has a special meaning in the replacement
text for a sed "s" command, so this escapes it.  Second, this code
misprocessed bold or underlined &<>: >^H> would become &gt;^H&gt; which
would display as &gt&gt; in most browers.

Finally, this improves the HTML output so that bold ABC becomes <b>ABC</b>
instead of <b>A</b><b>B</b><b>C</b>.

Reported-by: Nicolas Bouliane <nbouliane@digitalocean.com>
Reported-at: https://twitter.com/nicboul/status/1126959264772259842
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: run local logical flows first in S_ROUTER_OUT_SNAT table
Lorenzo Bianconi [Sat, 6 Jul 2019 10:45:00 +0000 (12:45 +0200)]
OVN: run local logical flows first in S_ROUTER_OUT_SNAT table

Run local logical flows first if the gw router port is scheduled
on the local chassis in order to properly manage snat traffic

Tested-by: Eran Kuris <ekuris@redhat.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Fixed a bug for checking the correct major version and revision.
Ashish Varma [Mon, 8 Jul 2019 16:51:29 +0000 (09:51 -0700)]
rhel: Fixed a bug for checking the correct major version and revision.

Fixed a bug where checking for major version 3.10 and major revision not
equal to 327 or 693 or 957 should have gone to the default else at the end.
In the current code, the default else condition will not get executed
for kernel with major version 3.10 and major revision not equal
to 327/693/957 resulting in failure to load the kernel module.

Fixes: 402efbe4e176 ("rhel: Add 4.12 kernel support in ovs-kmod-manage.sh")
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Update stale chassis entry at init
Dumitru Ceara [Mon, 8 Jul 2019 10:07:26 +0000 (12:07 +0200)]
ovn-controller: Update stale chassis entry at init

The first time ovn-controller initializes the Chassis entry (shortly
after start up) we first look if there is a stale Chassis record in the
OVN_Southbound DB by checking if any of the old Encap entries associated
to the Chassis record match the new tunnel configuration. If found it
means that ovn-controller didn't shutdown gracefully last time it was
run so it didn't cleanup the Chassis table. Potentially in the meantime
the OVS system-id was also changed. We then update the stale entry with
the new configuration and store the last configured chassis-id in memory
to avoid walking the Chassis table every time.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Refactor chassis.c to abstract the string parsing
Dumitru Ceara [Mon, 8 Jul 2019 10:07:12 +0000 (12:07 +0200)]
ovn-controller: Refactor chassis.c to abstract the string parsing

Abstract out the chassis config string processing and use library data
structures (e.g., sset).
Rename the get_chassis_id function in ovn-controller.c to
get_ovs_chassis_id to avoid confusion with the newly added
chassis_get_id function from chassis.c which returns the last
successfully configured chassis-id.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Fix chassis ovn-sbdb record init
Dumitru Ceara [Mon, 8 Jul 2019 10:07:00 +0000 (12:07 +0200)]
ovn-controller: Fix chassis ovn-sbdb record init

The chassis_run code didn't take into account the scenario when the
system-id was changed in the Open_vSwitch table. Due to this the code
was trying to insert a new Chassis record in the OVN_Southbound DB with
the same Encaps as the previous Chassis record. The transaction used
to insert the new records was aborting due to the ["type", "ip"]
index constraint violation as we were creating new Encap entries with
the same "type" and "ip" as the old ones.

In order to fix this issue the flow is now:
1. the first time ovn-controller initializes the Chassis (shortly after
start up) we store the chassis-id.
2. for subsequent chassis_run calls we use last configured
chassis-id stored at the previous step to lookup the old Chassis record.
3. when ovn-controller shuts down gracefully we lookup the Chassis
record based on the chassis-id stored in memory at steps 1 and 2 above.
This is to avoid failing to cleanup the Chassis record in OVN_Southbound
DB if the OVS system-id changes between the last call to chassis_run and
chassis_cleanup.

Reported-at: https://bugzilla.redhat.com/1708146
Reported-by: Haidong Li <haili@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-dpdk: Enable tx-retries-max config.
Kevin Traynor [Tue, 2 Jul 2019 00:32:30 +0000 (01:32 +0100)]
netdev-dpdk: Enable tx-retries-max config.

vhost tx retries can provide some mitigation against
dropped packets due to a temporarily slow guest/limited queue
size for an interface, but on the other hand when a system
is fully loaded those extra cycles retrying could mean
packets are dropped elsewhere.

Up to now max vhost tx retries have been hardcoded, which meant
no tuning and no way to disable for debugging to see if extra
cycles spent retrying resulted in rx drops on some other
interface.

Add an option to change the max retries, with a value of
0 effectively disabling vhost tx retries.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dpdk: Add custom stat for vhost tx retries.
Kevin Traynor [Tue, 2 Jul 2019 00:32:29 +0000 (01:32 +0100)]
netdev-dpdk: Add custom stat for vhost tx retries.

vhost tx retries may occur, and it can be a sign that
the guest is not optimally configured.

Add a custom stat so a user will know if vhost tx retries are
occurring and hence give a hint that guest config should be
examined.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodoc: Move vhost tx retry info to separate section.
Kevin Traynor [Tue, 2 Jul 2019 00:32:28 +0000 (01:32 +0100)]
doc: Move vhost tx retry info to separate section.

vhost tx retry is applicable to vhost-user and vhost-user-client,
but was in the section that compares them. Also, moved further
down the doc as prefer to have more fundamental info about vhost
nearer the top.

Fixes: 6d6513bfc657 ("doc: Add info on vhost tx retries.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoOVN: do not distribute traffic for local FIP
Lorenzo Bianconi [Thu, 13 Jun 2019 17:47:59 +0000 (19:47 +0200)]
OVN: do not distribute traffic for local FIP

Do not send traffic for local FIP through the overlay tunnels but
manage it in the local hypervisor

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoDocumentation: Clarify connection tracking tutorial
Greg Rose [Wed, 19 Jun 2019 21:56:54 +0000 (14:56 -0700)]
Documentation: Clarify connection tracking tutorial

The current documentation states that "all packets entering OVS for
the first time are "untracked"".  However there is a minor exception
to this in the case where a packet (re)enters the same datapath and
the namespace has not changed.  In that case there is no need to
scrub the packet and in this case the connection may already be
in the "tracked" state.

Reported-by: Quan Tian <qtian@vmware.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorconn: Increase precision of timers.
Ben Pfaff [Tue, 11 Jun 2019 16:55:16 +0000 (09:55 -0700)]
rconn: Increase precision of timers.

Until now, the rconn timers have been precise only to the nearest second.
This increases them to millisecond precision, which seems cleaner these
days.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorconn: Remove write-only struct members.
Ben Pfaff [Tue, 11 Jun 2019 16:55:15 +0000 (09:55 -0700)]
rconn: Remove write-only struct members.

Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agosat-math: Add functions for saturating arithmetic on "long long int".
Ben Pfaff [Tue, 11 Jun 2019 16:55:14 +0000 (09:55 -0700)]
sat-math: Add functions for saturating arithmetic on "long long int".

The first users will be added in an upcoming commit.

Also add tests.

Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofproto-dpif-upcall: Remove unused macro MAX_QUEUE_LENGTH.
Yunjian Wang [Wed, 19 Jun 2019 04:22:45 +0000 (12:22 +0800)]
ofproto-dpif-upcall: Remove unused macro MAX_QUEUE_LENGTH.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Omit tracking external_ids columns
Numan Siddique [Fri, 28 Jun 2019 10:44:04 +0000 (16:14 +0530)]
ovn-controller: Omit tracking external_ids columns

Running the command "ovn-nbctl set logical_switch_port foo external_ids:foo=bar"
results in the incremetal processing engine to recompute the flows on the
chassis where the logical port 'foo' is claimed.

This patch avoids this unnecessary recomputation by omitting the tracking of
external_ids column of all the Southbound DB tables except DNS, Chassis
and Datapath_Binding tables. ovn-controller is refering to the external_ids
column of these tables.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: Enable E-W Traffic, Vlan backed DVR
Ankur Sharma [Thu, 20 Jun 2019 01:36:46 +0000 (01:36 +0000)]
OVN: Enable E-W Traffic, Vlan backed DVR

Background:
[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353066.html
[2] https://docs.google.com/document/d/1uoQH478wM1OZ16HrxzbOUvk5LvFnfNEWbkPT6Zmm9OU/edit?usp=sharing

Key difference between an overlay logical switch and
vlan backed logical switch is that for vlan logical switches
packets are not encapsulated.

Hence, if a distributed router port is connected to vlan backed
logical switch, then router port mac as source mac could be
seen from multiple hypervisors. Same <mac,vlan> pairs coming
from multiple ports from a top of the rack switch (TOR) perspective
could be seen as a security threat and it could send alarms, drop
the packets or block the ports etc.

This patch addresses the same by introducing the concept of chassis mac.
A chassis mac is CMS provisioned unique mac per chassis. For any routed packet
(i.e source mac is router port mac) going on the wire on a vlan type
logical switch, we will replace its source mac with chassis mac.

This replacing of source mac with chassis mac will happen in table=65
of the logical switch datapath. A flow is added at priority 150, which
matches the source mac and replaces it with chassis mac if the value
is a router port mac.

Example flow:
cookie=0x0, duration=67765.830s, table=65, n_packets=0, n_bytes=0,
idle_age=65534, hard_age=65534, priority=150,reg15=0x1,metadata=0x4,
dl_src=00:00:01:01:02:03 actions=mod_dl_src:aa:bb:cc:dd:ee:ff,
mod_vlan_vid:1000,output:16

Here, 00:00:01:01:02:03 is router port mac and aa:bb:cc:dd:ee:ff
is chassis mac.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ankur Sharma <ankur.sharma@nutanix.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-controller: Provide the option to configure inactivity probe interval for OpenFlo...
Numan Siddique [Mon, 1 Jul 2019 07:42:08 +0000 (13:12 +0530)]
ovn-controller: Provide the option to configure inactivity probe interval for OpenFlow conn

If the ovn-controller main loop takes more than 5 seconds (if there are lots of logical
flows) before it calls poll_block(), it causes the poll_block to wake up immediately,
since rconn module has to send echo request. With the incremental processing, this is
not an issue as ovn-controller will not recompute again. But for older versions, this
is an issue as it causes flow recomputations and this would result in 100% cpu all the
time.

With this patch, CMS can configure a higher value depending the workload.

The main intention of this patch is to fix this recompuation issue for older versions
(there by requesting backport), it still would be beneficial with the
incremental processing engine.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodb-ctl-base: fix memory leak in cmd-get() function
Damijan Skvarc [Fri, 5 Jul 2019 11:38:47 +0000 (13:38 +0200)]
db-ctl-base: fix memory leak in cmd-get() function

Memory leak occured in case specified key was not found in table
record.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Send GARP for router port IPs of a router port connected to bridged logical...
Numan Siddique [Mon, 1 Jul 2019 07:43:39 +0000 (13:13 +0530)]
ovn: Send GARP for router port IPs of a router port connected to bridged logical switch

This patch handles sending GARPs for

 - router port IPs of a distributed router port

 - router port IPs of a router port which belongs to gateway router
   (with the option - redirect-chassis set in Logical_Router.options)

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn: Send GARP for the router ports with reside-on-redirect-chassis options set
Numan Siddique [Mon, 1 Jul 2019 07:43:20 +0000 (13:13 +0530)]
ovn: Send GARP for the router ports with reside-on-redirect-chassis options set

With the commit [1], the routing for the provider logical switches
connected to a router is centralized on the master gateway chassis
(if the option - reside-on-redirect-chassis) is set. When the
failover happens and a standby gateway chassis becomes master,
it should send GARPs for the router port macs. Without this, the
physical switch doesn't learn the new location of the router port macs
immediately and this could result in traffic disruption.

This patch addresses this issue so that the ovn-controller which claims the
distributed gatweway router port sends out the GARPs.

ovn-controller sends the GARPs if the Port_Binding.nat_addresses column
is set. This patch makes use of this column, instead of adding a new column
even though the name - nat_addresses seems a bit misnomer. The documentation is
updated to highlight the usage of this column.

This patch doesn't handle sending the GARPs for the gateway router port IPs.
This will be handled in a separate patch.

[1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis")

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-northd: Refactor the code which sets nat_addresses
Numan Siddique [Mon, 1 Jul 2019 07:43:11 +0000 (13:13 +0530)]
ovn-northd: Refactor the code which sets nat_addresses

The present code which sets the Port_Binding.nat_addresses
can be simplied. This patch does this. This would help in
upcoming commits to set the nat_addresses column with the
mac and IPs of distributed logical router ports and logical
router ports with 'reside-on-redirect-chassis' set.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotunnel: Add layer 2 IPv6 GRE encapsulation support.
William Tu [Mon, 1 Jul 2019 19:45:22 +0000 (12:45 -0700)]
tunnel: Add layer 2 IPv6 GRE encapsulation support.

The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type=
legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets
and encap with GRE header with outer IPv6 header.  Encapsulation of layer 3
packet over IPv6 GRE, ip6gre, is not supported yet.  I tested it by running:
  # make check-kernel TESTSUITEFLAGS='-k ip6gre'
under kernel 5.2 and for userspace:
  # make check TESTSUITEFLAGS='-k ip6gre'

Tested-by: Greg Rose <gvrose8192@gmail.com>
Tested-at: https://travis-ci.org/gvrose8192/ovs-experimental/builds/552977116
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: Clean up tunnel_id_to_key
Greg Rose [Wed, 3 Jul 2019 17:04:55 +0000 (10:04 -0700)]
compat: Clean up tunnel_id_to_key

This function was just a duplicate of tunnel_id_to_key32 - I'm not sure
why it was ever needed but let's dump it now.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: Clean up gre_calc_hlen
Greg Rose [Wed, 3 Jul 2019 17:04:54 +0000 (10:04 -0700)]
compat: Clean up gre_calc_hlen

It's proliferated throughout three .c files so let's pull them all
together in gre.h where the inline function belongs. This requires
some adjustments to the compat layer so that the various iterations
of gre_calc_hlen and ip_gre_calc_hlen since the 3.10 kernel are
handled correctly.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agocompat: Remove duplicate metadata destination code
Greg Rose [Wed, 3 Jul 2019 17:04:53 +0000 (10:04 -0700)]
compat: Remove duplicate metadata destination code

ip_gre.c and ip6_gre.c both had duplicate code for handling the tunnel
metadata destinations.  Move the duplicate code over into the right
header file, dst_metadata.h.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoossfuzz: Remove duplicate tcp flags parsing in flow extract target
Bhargava Shastry [Fri, 21 Jun 2019 12:50:35 +0000 (14:50 +0200)]
ossfuzz: Remove duplicate tcp flags parsing in flow extract target

During a code audit, the flow extraction fuzzer target was seen to be
 parsing tcp flags from the fuzzer supplied input twice. This is
probably a typo since the second call to `parse_tcp_flags()` is
identical to the first.
Since a call to `parse_tcp_flags()` parses the Ethernet and IP headers
contained in the packet, the second (buggy) call to `parse_tcp_flags()`
creates an expectation that there is a second set of Ethernet and IP
headers beyond the first which is incorrect. This patch fixes this
problem by removing the duplicate code in question.

Signed-off-by: Bhargava Shastry <bshas3@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoossfuzz: Add documentation
Bhargava Shastry [Fri, 21 Jun 2019 14:21:02 +0000 (16:21 +0200)]
ossfuzz: Add documentation

Documents OvS fuzzing effort and performs a rudimentary security
analysis of existing OvS fuzzing harnesses.

Feedback on the documentation and analysis appreciated.

Signed-off-by: Bhargava Shastry <bshas3@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovsdb-idl: Improve comments.
Ben Pfaff [Wed, 26 Jun 2019 21:02:14 +0000 (14:02 -0700)]
ovsdb-idl: Improve comments.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Suggested-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agofaq: Correct supported kernel versions for OVS 2.11.x.
Ben Pfaff [Thu, 27 Jun 2019 13:51:43 +0000 (06:51 -0700)]
faq: Correct supported kernel versions for OVS 2.11.x.

I don't think we're planning to backport 5.0 support to OVS 2.11.x, because
that would be counter to our usual practice.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Fixes: 2adada0e3db2 ("datapath: Support kernel version 5.0.x")
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovn-nbctl: fix memory leak
Damijan Skvarc [Wed, 3 Jul 2019 11:50:40 +0000 (13:50 +0200)]
ovn-nbctl: fix memory leak

Patch is mostly intended to prevent valgrind to report memory leak issues
while running unit tests. Otherwise it does not benefit anything since
the application exits immediately after freeing the memory.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agovswitchd: Always cleanup userspace datapath.
Ilya Maximets [Mon, 24 Jun 2019 14:20:17 +0000 (17:20 +0300)]
vswitchd: Always cleanup userspace datapath.

'netdev' datapath is implemented within ovs-vswitchd process and can
not exist without it, so it should be gracefully terminated with a
full cleanup of resources upon ovs-vswitchd exit.

This change forces dpif cleanup for 'netdev' datapath regardless of
passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to
not pass this additional option everytime for userspace datapath
installations and also allowes to not terminate system datapath in
setups where both datapaths runs at the same time.

The main part is that dpif_port_del() will lead to netdev_close()
and subsequent netdev_class->destroy(dev) which will stop HW NICs
and free their resources. For vhost-user interfaces it will invoke
vhost driver unregistering with a properly closed vhost-user
connection. For upcoming AF_XDP netdev this will allow to gracefully
destroy xdp sockets and unload xdp programs from linux interfaces.
Another important thing is that port deletion will also trigger
flushing of flows offloaded to HW NICs.

Exception made for 'internal' ports that could have user ip/route
configuration. These ports will not be removed without '--cleanup'.

This change fixes OVS disappearing from the DPDK point of view
(keeping HW NICs improperly configured, sudden closing of vhost-user
connections) and will help with linux devices clearing with upcoming
AF_XDP netdev support.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoNEWS: Update regarding dumping HW offloaded flows.
Ilya Maximets [Mon, 1 Jul 2019 10:20:55 +0000 (13:20 +0300)]
NEWS: Update regarding dumping HW offloaded flows.

NEWS update was missed while updating docs for dynamic Flow API.
Since this is a user visible change, it should be mentioned here.

Fixes: d74ca2269e36 ("dpctl: Update docs about dump-flows and HW offloading.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Acked-by: Eli Britstein <elibr@mellanox.com>
5 years agonetdev-offload-tc: Fix requesting match on wildcarded vlan tpid.
Ilya Maximets [Wed, 19 Jun 2019 08:05:38 +0000 (11:05 +0300)]
netdev-offload-tc: Fix requesting match on wildcarded vlan tpid.

'mask' must be checked first before configuring key in flower.

CC: Eli Britstein <elibr@mellanox.com>
Fixes: 0b0a84783cd6 ("netdev-tc-offloads: Support match on priority tags")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
5 years agoovsdb-idl: memory leak while destroying database
Damjan Skvarc [Mon, 1 Jul 2019 10:24:38 +0000 (12:24 +0200)]
ovsdb-idl: memory leak while destroying database

While checking unit tests with valgrind option (make check-valgrind) I have
noticed several memory leaks of the following format:

.....
==20019== 13,883 (296 direct, 13,587 indirect) bytes in 1 blocks are definitely lost in loss record 346 of 346
==20019==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20019==    by 0x530F52: xcalloc (util.c:121)
==20019==    by 0x5037A1: ovsdb_idl_row_create__ (ovsdb-idl.c:3120)
==20019==    by 0x5045A3: ovsdb_idl_row_create (ovsdb-idl.c:3133)
==20019==    by 0x507240: ovsdb_idl_process_update2 (ovsdb-idl.c:2478)
==20019==    by 0x507240: ovsdb_idl_db_parse_update__ (ovsdb-idl.c:2328)
==20019==    by 0x507240: ovsdb_idl_db_parse_update (ovsdb-idl.c:2380)
==20019==    by 0x508128: ovsdb_idl_process_response (ovsdb-idl.c:742)
==20019==    by 0x508128: ovsdb_idl_process_msg (ovsdb-idl.c:831)
==20019==    by 0x508128: ovsdb_idl_run (ovsdb-idl.c:915)
==20019==    by 0x4106D9: bridge_run (bridge.c:2977)
==20019==    by 0x40719C: main (ovs-vswitchd.c:127)
==20019==
==20019== LEAK SUMMARY:
==20019==    definitely lost: 296 bytes in 1 blocks
==20019==    indirectly lost: 13,587 bytes in 10 blocks
==20019==      possibly lost: 0 bytes in 0 blocks
==20019==    still reachable: 43,563 bytes in 440 blocks
==20019==         suppressed: 288 bytes in 1 blocks
....

The problem is that table records maintained by database which is going to
be destroyed with ovsdb_idl_db_destroy() function are not destroyed.

Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVN: add the possibility to specify tunnel dst port
Lorenzo Bianconi [Tue, 25 Jun 2019 10:35:26 +0000 (12:35 +0200)]
OVN: add the possibility to specify tunnel dst port

Introduce dst_port in options column of Encap table in order to add the
capability to configure destination port used for tunnel encapsulation

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodoc: Add info on vhost tx retries.
Kevin Traynor [Thu, 27 Jun 2019 11:12:30 +0000 (12:12 +0100)]
doc: Add info on vhost tx retries.

Add documentation about vhost tx retries and external
configuration that can help reduce/avoid them.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agostream-ssl: Fix crash on NULL private key and valid certificate.
Ilya Maximets [Tue, 25 Jun 2019 14:28:02 +0000 (17:28 +0300)]
stream-ssl: Fix crash on NULL private key and valid certificate.

Running ovsdb-server with empty private-key and non-empty certificate
(or otherwise) causes crash:

 # ovsdb-tool create ./etc/openvswitch/conf.db ./vswitch.ovsschema
 # ovsdb-server --remote=punix:./db.sock \
                --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
                --private-key=db:Open_vSwitch,SSL,private_key \
                --certificate=db:Open_vSwitch,SSL,certificate \
                --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert

 # ovs-vsctl --no-wait init
 # ovs-vsctl --no-wait set-ssl pkey.key cert.cert ca.cert
 # ovs-vsctl --no-wait set SSL . private_key='""'
 # ovs-vsctl --no-wait set SSL . certificate='cert.new'

 ==25513==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000
 ==25513==The signal is caused by a READ memory access.
 ==25513==Hint: address points to the zero page.
    #0 0x7ff7582aa0a9 in __GI___strlen_sse2
    #1 0x7ff759bdde81  (/lib64/libasan.so.5+0xace81)
    #2 0x7ff759479932  (/lib64/libcrypto.so.1.1+0xb3932)
    #3 0x7ff759473c5a in BIO_ctrl (/lib64/libcrypto.so.1.1+0xadc5a)
    #4 0x7ff7598decc1 in SSL_CTX_use_certificate_file (/lib64/libssl.so.1.1+0x40cc1)
    #5 0x4dbaa7 in stream_ssl_set_certificate_file__ lib/stream-ssl.c:1170
    #6 0x4dca2e in stream_ssl_set_key_and_cert lib/stream-ssl.c:1216
    #7 0x4146b2 in reconfigure_ssl ovsdb/ovsdb-server.c:1254
    #8 0x409c83 in main ovsdb/ovsdb-server.c:368
    #9 0x7ff758233812 in __libc_start_main
    #10 0x40f6bd in _start (ovsdb-server+0x40f6bd)

 AddressSanitizer can not provide additional info.
 SUMMARY: AddressSanitizer: SEGV (/lib64/libc.so.6+0x9a0a9) in __GI___strlen_sse2
 ==25513==ABORTING

Another way to reproduce is to use non-initialized DB entry for
private-key and a file for certificate in ovsdb-server cmdline.

The root cause is that stream_ssl_set_key_and_cert() triggers
configuration for both key and cert if any of them is valid, keeping
it possible for one of them to be NULL.

Fixes: 6f1e91b1d7c0 ("stream-ssl: Make changing keys and certificate at runtime reliable.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-dpdk: Fix additional vhost tx retry.
Kevin Traynor [Thu, 27 Jun 2019 11:12:29 +0000 (12:12 +0100)]
netdev-dpdk: Fix additional vhost tx retry.

Fix minor issue of one possible additional retry.

Fixes: c6ec9d176dbf ("netdev-dpdk: Fix vHost stats.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agonetdev-dpdk: Reset queue number for vhost devices on vm shutdown.
David Marchand [Thu, 27 Jun 2019 09:43:36 +0000 (11:43 +0200)]
netdev-dpdk: Reset queue number for vhost devices on vm shutdown.

Rather than poll all disabled queues and waste some memory for vms that
have been shutdown, we can reconfigure when receiving a destroy
connection notification from the vhost library.

$ while true; do
  ovs-appctl dpif-netdev/pmd-rxq-show |awk '
  /port: / {
    tot++;
    if ($5 == "(enabled)") {
      en++;
    }
  }
  END {
    print "total: " tot ", enabled: " en
  }'
  sleep 1
done

total: 66, enabled: 66
total: 6, enabled: 2

This change requires a fix in the DPDK vhost library, so bump the minimal
required version to 18.11.2.

Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpdk: Use DPDK 18.11.2 release.
Ian Stokes [Wed, 26 Jun 2019 21:06:05 +0000 (22:06 +0100)]
dpdk: Use DPDK 18.11.2 release.

Modify travis linux build script to use the latest DPDK stable release
18.11.2. Update docs for latest DPDK stable releases.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
5 years agovswitchd: Separate disable system and route.
William Tu [Tue, 25 Jun 2019 21:52:38 +0000 (14:52 -0700)]
vswitchd: Separate disable system and route.

Previously, '--disable-system' disables both system dp and the system
routing table.  The patch makes '--disable-system' only disable system
dp and adds '--disable-system-route' for disabling the route table.
This fixes failures when 'make check-system-userspace' for tunnel cases.

As a consequence, hitting errors due to OVS userspace parses the IGMP packet
but its datapaths do not, so odp_flow_key_to_flow() return ODP_FIT_TOO_LITTLE.
commit c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.")
Fix it by filtering out the IGMP-related error message.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoovs-atomic-c++.h: Fix for 64 bit atomics.
Gurucharan Shetty [Wed, 12 Jun 2019 10:57:29 +0000 (03:57 -0700)]
ovs-atomic-c++.h: Fix for 64 bit atomics.

Commit e981a45a6cae4 (ovs-atomic: Add 64 bit apis.)
added a few 64 bit apis (e.g: atomic_count_inc64).  For C++,
this invokes std::atomic_fetch_*_explicit() functions in
lib/ovs-atomic-c++.h.

The function overloading for 64 bit function fails without
specifiying something like: std::atomic_fetch_*_explicit<std::uint64_t>().
But it looks tricky to do this with macros.

This patch tries to fix the compilation failures by calling atomic
functions on the variables itself.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-dpdk: Avoid reconfiguration on VIRTIO_NET_F_MQ changes.
David Marchand [Thu, 25 Apr 2019 15:22:09 +0000 (17:22 +0200)]
netdev-dpdk: Avoid reconfiguration on VIRTIO_NET_F_MQ changes.

At the moment, a malicious guest might negotiate VIRTIO_NET_F_MQ and
!VIRTIO_NET_F_MQ in a loop which would be seen as qp_num going from 1 to
n and n to 1 continuously, triggering datapath reconfigurations at each
transition.

Limit this by only reconfiguring on increased qp_num.
The previous patch reduced the observed cost of polling disabled queues,
so the only cost is memory.

Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agodpif-netdev: Only poll enabled vhost queues.
David Marchand [Thu, 25 Apr 2019 15:22:08 +0000 (17:22 +0200)]
dpif-netdev: Only poll enabled vhost queues.

We currently poll all available queues based on the max queue count
exchanged with the vhost peer and rely on the vhost library in DPDK to
check the vring status beneath.
This can lead to some overhead when we have a lot of unused queues.

To enhance the situation, we can skip the disabled queues.
On rxq notifications, we make use of the netdev's change_seq number so
that the pmd thread main loop can cache the queue state periodically.

$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 1:
  isolated : true
  port: dpdk0             queue-id:  0 (enabled)   pmd usage:  0 %
pmd thread numa_id 0 core_id 2:
  isolated : true
  port: vhost1            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost3            queue-id:  0 (enabled)   pmd usage:  0 %
pmd thread numa_id 0 core_id 15:
  isolated : true
  port: dpdk1             queue-id:  0 (enabled)   pmd usage:  0 %
pmd thread numa_id 0 core_id 16:
  isolated : true
  port: vhost0            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost2            queue-id:  0 (enabled)   pmd usage:  0 %

$ while true; do
  ovs-appctl dpif-netdev/pmd-rxq-show |awk '
  /port: / {
    tot++;
    if ($5 == "(enabled)") {
      en++;
    }
  }
  END {
    print "total: " tot ", enabled: " en
  }'
  sleep 1
done

total: 6, enabled: 2
total: 6, enabled: 2
...

 # Started vm, virtio devices are bound to kernel driver which enables
 # F_MQ + all queue pairs
total: 6, enabled: 2
total: 66, enabled: 66
...

 # Unbound vhost0 and vhost1 from the kernel driver
total: 66, enabled: 66
total: 66, enabled: 34
...

 # Configured kernel bound devices to use only 1 queue pair
total: 66, enabled: 34
total: 66, enabled: 19
total: 66, enabled: 4
...

 # While rebooting the vm
total: 66, enabled: 4
total: 66, enabled: 2
...
total: 66, enabled: 66
...

 # After shutting down the vm
total: 66, enabled: 66
total: 66, enabled: 2

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agocompat: Fix compilation error on CentOS 7.6
Yi-Hung Wei [Tue, 25 Jun 2019 18:09:07 +0000 (11:09 -0700)]
compat: Fix compilation error on CentOS 7.6

This fix the compilation issue on CentOS 7.6 kernel
(3.10.0-957.21.3.el7.x86_64).

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-June/360013.html
Reported-by: Fred Neubauer <fred.neubauer@gmail.com>
Fixes: 6660a9597a49 ("datapath: compat: Introduce static key support")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Fix upgrade path
Greg Rose [Tue, 25 Jun 2019 18:45:52 +0000 (11:45 -0700)]
rhel: Fix upgrade path

There is a bug in the upgrade path from the old kmod-openvswitch SysV
based RPM to the new openvswitch-kmod systemd based RPM. Since the
name of the package is changed it is not possible to use the yum
or rpm upgrade options.  This prevents passing in a 1 or 2 to the
%postun scriptlet section of the older RPM and that causes the section
to be treated as an 'erase'.  The old kmod-openvswitch %postun section
proceeds to erase the symlinks in ../weak-updates/openvwswitch that
the installation of the new package had just created.

Fix this by adding a %posttrans tag to the systemd spec file.  This
scriptlet is called after the symlinks have just been erased and
it calls the ovs-kmod-manage.sh script to recreate the symlinks and
run depmod -a again so that the correct kernel modules will be
found and loaded.

VMware-BZ: #236987

Cc: Aaron Conole <aconole@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
5 years agoofproto-dpif: Fix continuation with patch port
Yi-Hung Wei [Fri, 21 Jun 2019 17:51:23 +0000 (10:51 -0700)]
ofproto-dpif: Fix continuation with patch port

This patch fixes the ofp_port to odp_port translation issue on patch
port with nxt_resume.  When OVS resumes processing a packet from
nxt_resume, OVS does not translate the ofp in_port to odp in_port
correctly if the packet is originally received from a patch port.
Currently,OVS sets the odp in_port for this resume pakcet as ODPP_NONE
and push the resume packet back to the datapath. Later on, if the packet
goes through a recirc, OVS will generate the following message since it
can not translate odp in_port (ODPP_NONE) back to ofp in_port during upcall,
and push down a datapath rule to drop the packet.

    ofproto_dpif_upcall(handler16)|INFO|received packet on unassociated
        datapath port 4294967295

When OVS revalidates the drop datapath flow with ODPP_NONE in_port, we
will see the following warning.
    ofproto_dpif_upcall(revalidator18)|WARN|Failed to acquire udpif_key
        corresponding to unexpected flow (Invalid argument): ufid:....

This patch resolves this issue by storing the odp in_port in the
continuation messages, and restores the odp in_port before push the
packet back to the datapath.

VMWare-BZ: 2364696
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOpenFlow: Enable OpenFlow 1.5 by default.
Ben Pfaff [Mon, 24 Apr 2017 18:49:59 +0000 (11:49 -0700)]
OpenFlow: Enable OpenFlow 1.5 by default.

Open vSwitch now supports all OpenFlow 1.5 required features, so enable
it by default.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoofp-actions: Support OF1.5 meter action.
Ben Pfaff [Tue, 30 Apr 2019 16:19:27 +0000 (09:19 -0700)]
ofp-actions: Support OF1.5 meter action.

OpenFlow 1.5 changed "meter" from an instruction to an action.  This commit
supports it properly.

Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Make it possible to build against a dpdk branch.
David Marchand [Wed, 19 Jun 2019 07:26:29 +0000 (09:26 +0200)]
travis: Make it possible to build against a dpdk branch.

Rework the build script so that we can pass branches and tags.

With this, DPDK_VER can be passed as:
- a string starting with refs/ which is understood as a git reference.
  This triggers a git clone on DPDK_GIT (default value points to
  https://dpdk.org/git/dpdk) for a single branch pointing to this
  reference (to save some disk),
- else, any other string which is understood as an official release.
  This triggers a tarball download on dpdk.org.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
5 years agotravis: Do not patch dpdk sources.
David Marchand [Wed, 19 Jun 2019 07:26:28 +0000 (09:26 +0200)]
travis: Do not patch dpdk sources.

Rather than patch the dpdk makefile and a template config file, we can
pass the -fPIC flag via EXTRA_CFLAGS.
This is more reliable than expecting the dpdk file names to be kept
unchanged.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
5 years agoAUTHORS: Add Yanqin Wei and Malvika Gupta.
Ben Pfaff [Thu, 13 Jun 2019 17:52:51 +0000 (10:52 -0700)]
AUTHORS: Add Yanqin Wei and Malvika Gupta.

Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoutil: implement count_1bits with Neon intrinsics or gcc built-in for aarch64.
Yanqin Wei [Thu, 13 Jun 2019 10:38:07 +0000 (18:38 +0800)]
util: implement count_1bits with Neon intrinsics or gcc built-in for aarch64.

Userspace datapath needs to traverse through miniflow values many times. In
this process, 'count_1bits' operation for 'Flowmap' significantly impact
performance. On arm, this function was defined by portable implementation
because gcc for arm does not support popcnt feature.
But in the aarch64, VCNT neon instruction can accelerate "count_1bits".
From Gcc-7, the built-in function is implemented with neon intruction.
In this patch, count_1bits function will be impelmented with gcc built-in
from gcc-7 on, and with neon intrinsics in gcc-6.
Performance test was run in two aarch64 machines. In the NIC2NIC test, one
tuple dpcls lookup case achieves around 4% throughput improvement and
10(average) tuples case achieves around 5% improvement.

Tested-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Support kernel version 5.0.x
Yifeng Sun [Wed, 12 Jun 2019 22:35:29 +0000 (15:35 -0700)]
datapath: Support kernel version 5.0.x

This patch updated acinclude.m4 so that OVS can be compiled on
5.0.x kernels.
This patch also updated travis files so that 5.0.x kernel versions
are used during travis test builds.
Besides, NEWS and releases.rst are also updated to reflect this
new support.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonet: core: dev: Add extack argument to dev_change_flags()
Petr Machata [Wed, 12 Jun 2019 22:35:28 +0000 (15:35 -0700)]
net: core: dev: Add extack argument to dev_change_flags()

Upstream commit:
    commit 567c5e13be5cc74d24f5eb54cf353c2e2277189b
    Author: Petr Machata <petrm@mellanox.com>
    Date:   Thu Dec 6 17:05:42 2018 +0000

    net: core: dev: Add extack argument to dev_change_flags()

    In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_change_flags().

    Therefore extend dev_change_flags() with and extra extack argument and
    update all users. Most of the calls end up just encoding NULL, but
    several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

    Since the function declaration line is changed anyway, name the other
    function arguments to placate checkpatch.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch backports the above upstream patch and also adds fixes
in compat code.

Cc: Petr Machata <petrm@mellanox.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Backport the removal of __tcp_checksum_complete()
Yifeng Sun [Wed, 12 Jun 2019 22:35:27 +0000 (15:35 -0700)]
datapath: Backport the removal of __tcp_checksum_complete()

Upstream commit 6ab6dfa6bb500f5cbb9b7a0f23a1613417ca2d12 ("net: get
rid of __tcp_checksum_complete())" deleted __tcp_checksum_complete()
and caused compilation failure for OVS on newer kernels.

This patch fixes it by using __skb_checksum_complete(), which is
100% the same with __tcp_checksum_complete().

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agoOVS: remove use of VLAN_TAG_PRESENT
Michał Mirosław [Wed, 12 Jun 2019 22:35:26 +0000 (15:35 -0700)]
OVS: remove use of VLAN_TAG_PRESENT

Upstream commits:
    (1) commit 9df46aefafa6dee81a27c2a9d8ba360abd8c5fe3
    Author: Michał Mirosław <mirq-linux@rere.qmqm.pl>
    Date:   Thu Nov 8 18:44:50 2018 +0100

    OVS: remove use of VLAN_TAG_PRESENT

    This is a minimal change to allow removing of VLAN_TAG_PRESENT.
    It leaves OVS unable to use CFI bit, as fixing this would need
    a deeper surgery involving userspace interface.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
    (2) commit 6083e28aa02d7c9e6b87f8b944e92793094ae047
    Author: Michał Mirosław <mirq-linux@rere.qmqm.pl>
    Date:   Sat Nov 10 19:55:34 2018 +0100

    OVS: remove VLAN_TAG_PRESENT - fixup

    It turns out I missed one VLAN_TAG_PRESENT in OVS code while rebasing.
    This fixes it.

Fixes: 9df46aefafa6 ("OVS: remove use of VLAN_TAG_PRESENT")
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch backports the above upstream patch to OVS and adds
extra checking in kernel module's compat code.

Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agodatapath: Check extack argument of rtnl_create_link()
Yifeng Sun [Wed, 12 Jun 2019 22:35:25 +0000 (15:35 -0700)]
datapath: Check extack argument of rtnl_create_link()

Upstream commit d0522f1cd25edb796548f91e04766fa3cbc3b6df ("net:
Add extack argument to rtnl_create_link") added new argument
to rtnl_create_link(). This introduced compiling errors in
the code of kernel datapath.

This patch fixes this issue.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-tc-offloads: Use correct hook qdisc at init tc flow
Raed Salem [Mon, 10 Jun 2019 11:58:40 +0000 (14:58 +0300)]
netdev-tc-offloads: Use correct hook qdisc at init tc flow

A preliminary netdev qdisc cleanup is done during init tc flow.
The cited commit allows for creating of egress hook qdiscs on internal
ports. This breaks the netdev qdisc cleanup as currently only ingress
hook qdiscs type is deleted. As a consequence the check for tc ingress
shared block support fails when the check is done on internal port.

Issue can be reproduced by the following steps:
- start openvswitch service
- create ovs bridge
- restart openvswitch service

Fix by using the correct hook qdisc type at netdev hook qdisc cleanup.

Fixes 608ff46aaf0d ("ovs-tc: offload datapath rules matching on internal ports")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
5 years agoovn-controller: Fix parsing of OVN tunnel IDs
Dumitru Ceara [Wed, 12 Jun 2019 15:59:02 +0000 (17:59 +0200)]
ovn-controller: Fix parsing of OVN tunnel IDs

Encap tunnel-ids are of the form:
<chassis-id><OVN_MVTEP_CHASSISID_DELIM><encap-ip>.
In physical_run we were checking if a tunnel-id corresponds
to the local chassis-id by searching if the chassis-id string
is included in the tunnel-id (strstr). This can break quite
easily, for example, if the local chassis-id is a substring
of a remote chassis-id. In that case we were wrongfully
skipping the tunnel creation.

To fix that new tunnel-id creation and parsing functions are added in
encaps.[ch]. These functions are now used everywhere where applicable.

Acked-by: Venu Iyer <iyervl@ymail.com>
Reported-at: https://bugzilla.redhat.com/1708131
Reported-by: Haidong Li <haili@redhat.com>
Fixes: b520ca7 ("Support for multiple VTEP in OVN")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Don't install kernel for DPDK checks.
Ilya Maximets [Tue, 11 Jun 2019 15:31:21 +0000 (18:31 +0300)]
travis: Don't install kernel for DPDK checks.

We don't need to build DPDK kernel modules to test build with OVS.
And we don't need to build OVS datapath modules for checking
userspace with DPDK.

Removed 'max-inline-insns-single' changes that only was needed for
DPDK kernel modules. Config modifications changed to update
generated build/.config instead of changing sources.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
5 years agoovn-controller: Cleanup memory in binding_evaluate_port_binding_changes
Dumitru Ceara [Tue, 11 Jun 2019 14:55:34 +0000 (16:55 +0200)]
ovn-controller: Cleanup memory in binding_evaluate_port_binding_changes

The 'lport_to_iface' and 'egress_ifaces' hashtables were not cleaned up
when checking if port bindings require a recompute.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048822.html
Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Fixes: 9d0b504abdee ("ovn-controller: runtime_data change handler for SB port-binding")
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agonetdev-offload: Rename offload providers.
Ilya Maximets [Tue, 7 May 2019 09:24:09 +0000 (12:24 +0300)]
netdev-offload: Rename offload providers.

Flow API providers renamed to be consistent with parent module
'netdev-offload' and look more like each other.

'_rte_' replaced with more convenient '_dpdk_'.

We'll have following structure:

  Common code:
    lib/netdev-offload-provider.h
    lib/netdev-offload.c
    lib/netdev-offload.h

  Providers:
    lib/netdev-offload-tc.c
    lib/netdev-offload-dpdk.c

'netdev-offload-dummy' still resides inside netdev-dummy, but it
makes no much sence to move it out of there.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
5 years agonetdev: Split up netdev offloading to separate module.
Ilya Maximets [Tue, 7 May 2019 09:24:08 +0000 (12:24 +0300)]
netdev: Split up netdev offloading to separate module.

New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.

Flow API providers further encapsulated.

The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
5 years agodpctl: Update docs about dump-flows and HW offloading.
Ilya Maximets [Wed, 15 May 2019 14:32:32 +0000 (17:32 +0300)]
dpctl: Update docs about dump-flows and HW offloading.

Since introduction of dynamic flow API for netdevs, tricky
accesses to uninitialized flow API are no longer possible.
So, ovs-dpctl doesn't support dumping HW offloaded flows now.
Claim this in docs and man pages. Additionally forbidden
'type' argument for 'ovs-dpctl dump-flows'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
5 years agonetdev: Dynamic per-port Flow API.
Ilya Maximets [Tue, 7 May 2019 09:24:07 +0000 (12:24 +0300)]
netdev: Dynamic per-port Flow API.

Current issues with Flow API:

* OVS calls offloading functions regardless of successful
  flow API initialization. (ex. on init_flow_api failure)
* Static initilaization of Flow API for a netdev_class forbids
  having different offloading types for different instances
  of netdev with the same netdev_class. (ex. different vports in
  'system' and 'netdev' datapaths at the same time)

Solution:

* Move Flow API from the netdev_class to netdev instance.
* Make Flow API dynamic, i.e. probe the APIs and choose the
  suitable one.

Side effects:

* Flow API providers localized as possible in their modules.
* Now we have an ability to make runtime checks. For example,
  we could check if particular device supports features we
  need, like if dpdk device supports RSS+MARK action.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
5 years agorhel: let *-ctl handle runtime directory
Jaime Caamaño Ruiz [Mon, 10 Jun 2019 13:55:31 +0000 (15:55 +0200)]
rhel: let *-ctl handle runtime directory

Recent versions of systemd restores RuntimeDirectory ownership to the
unit's User in between execution of *Exec directives (see [1]). Using
ExecStartPre to reset RuntimeDirectory ownership to OVS_USER no longer
works as expected.

The ctl scripts already handle creation of the runtime directory with
correct ownership and permissions so we can basically remove
RuntimeDirectory from systemd unit file. There is still need to handle
ownsership to cover some upgrade scenarios, but success of that will be
optional as the directory itself wont exist at first time run.

[1] https://github.com/systemd/systemd/issues/12713

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: Fix ovn database dir optional on first run
Jaime Caamaño Ruiz [Mon, 10 Jun 2019 16:58:12 +0000 (18:58 +0200)]
rhel: Fix ovn database dir optional on first run

OVN database directory is createid on first run so make ownership
handling optional.

Fixes: 94e1e8be3187 ("rhel: run ovn with the same user as ovs")
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: set useropts optional for ovsdb-server
Jaime Caamaño Ruiz [Mon, 10 Jun 2019 16:58:11 +0000 (18:58 +0200)]
rhel: set useropts optional for ovsdb-server

systemd assesses the presssence of all EnvironmentFile before execution
of Exec* directives, thus useropts needs to be optional even though it
will always be created at ExecStartPre.

Fixes: 94e1e8be3187 ("rhel: run ovn with the same user as ovs")
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agorhel: useropts should be owned by package
Jaime Caamaño Ruiz [Mon, 10 Jun 2019 13:55:45 +0000 (15:55 +0200)]
rhel: useropts should be owned by package

So that is properly cleaned up after package is uninstalled.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agolacp: Don't send or receive PDUs when carrier state of slave is down
Nitin Katiyar [Sun, 9 Jun 2019 14:18:10 +0000 (14:18 +0000)]
lacp: Don't send or receive PDUs when carrier state of slave is down

Fortville NICs (or their drivers) can get into an inconsistent state,
in which the NIC can actually transmit and receive packets even
though they report "PHY down". In such a state, OVS can exchange and
process LACP messages and enable a LACP slave. However, further packet
exchange over the slave fails because OVS sees that the PHY is down.

This commit fixes the problem by making OVS ignore received LACP PDUs
and suppress transmitting LACP PDUs when carrier is down. In addition,
when a LACP PDU is received with carrier down, this commit triggers
rechecking the carrier status (by incrementing the connectivity sequence
number) to ensure that it is updated as quickly as possible.

Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agolacp: Avoid packet drop on LACP bond after link up
Nitin Katiyar [Sun, 9 Jun 2019 14:17:45 +0000 (14:17 +0000)]
lacp: Avoid packet drop on LACP bond after link up

Problem:
========
The OVS state machine that enables and disables bond slaves runs in
the OVS main thread. The OVS code that processes received LACP packets
runs in a different thread. Until now, when the latter processes a LACP
PDU that should enable a slave, the slave was only enabled when the
main thread was able to run the state machine. In some cases this led
to delays of up to 350ms when the main thread was busy or not scheduled,
which led to corresponding delays in which packets were dropped due to
the bond-admissibility check.

Fix:
====
When a LACP PDU is received, evaluate whether LACP slave can be enabled
(slave_may_enable()) and set LACP slave's may_enable from the datapath
thread itself. When may_enable = TRUE, it means L1 state is UP and
LACP-SYNC is done and it is waiting for the main thread to enable the
slave. Relax the check in bond_check_admissibility() to check for both
"enable" and "may_enable" of the LACP slave. This would avoid dropping
of packets until the main thread enables the slave from bundle_run().

Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
5 years agotravis: Test with latest stable kernel releases.
Ilya Maximets [Thu, 16 May 2019 16:39:21 +0000 (19:39 +0300)]
travis: Test with latest stable kernel releases.

Instead of managing kernel minor versions manually we could always test
with the most recent stable release of the desired branch.

With this patch applied Travis will always check with the most recent
kernels, so we'll be notified about changes in upstream kernels that
breaks the build of our kernel module. However, this will also break
Travis checks on patches that doesn't touch the kernel parts until
we fix the module.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
5 years agoAUTHORS: Add Damijan Skvarc and Jaime Caamaño Ruiz.
Ben Pfaff [Mon, 10 Jun 2019 00:28:05 +0000 (17:28 -0700)]
AUTHORS: Add Damijan Skvarc and Jaime Caamaño Ruiz.

Signed-off-by: Ben Pfaff <blp@ovn.org>