]> git.proxmox.com Git - mirror_ovs.git/log
mirror_ovs.git
4 years agolib/tc: only update the stats for non-empty counter
wenxu [Mon, 29 Jun 2020 09:31:18 +0000 (17:31 +0800)]
lib/tc: only update the stats for non-empty counter

A packet with first frag and execute act_ct action.
The packet will stole by defrag. So the stats counter
for "gact action goto chain" will always 0. The openvswitch
update each action in order. So the flower stats finally
alway be zero. The rule will be delete adter max-idle time
even there are packet executing the action.

ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(1),eth_type(0x0800),ipv4(dst=11.0.0.7,frag=first), packets:0, bytes:0, used:5.390s, actions:ct(zone=1,nat),recirc(0x4)

filter protocol ip pref 2 flower chain 0 handle 0x2
  eth_type ipv4
  dst_ip 1.1.1.1
  ip_flags frag/firstfrag
  skip_hw
  not_in_hw
 action order 1: ct zone 1 nat pipe
  index 2 ref 1 bind 1 installed 11 sec used 1 sec
 Action statistics:
 Sent 15000 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 cookie e04106c2ac41769b278edaa9b5309960

 action order 2: gact action goto chain 1
  random type none pass val 0
  index 2 ref 1 bind 1 installed 11 sec used 11 sec
 Action statistics:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 cookie e04106c2ac41769b278edaa9b5309960

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodatapath-windows: Add CTA_HELP and CTA_TUPLE_MASTER
Jinjun Gao [Tue, 30 Jun 2020 11:47:57 +0000 (19:47 +0800)]
datapath-windows: Add CTA_HELP and CTA_TUPLE_MASTER

Add helper and master if existing to a conntrack entry:
1, For CTA_HELP, only support FTP/TFTP;
2, For CTA_TUPLE_MASTER, only support FTP.

Signed-off-by: Jinjun Gao <jinjung@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
4 years agojsonrpc: Don't assert for 0 remotes in jsonrpc_session_open_multiple().
Ben Pfaff [Fri, 26 Jun 2020 19:46:10 +0000 (12:46 -0700)]
jsonrpc: Don't assert for 0 remotes in jsonrpc_session_open_multiple().

It's pretty easy to get 0 remotes here from ovn-northd if you specify
--ovnnb-db='' or --ovnnb-db='   ' on the command line.  The internals
of jsonrpc_session aren't equipped to cope with that, so just add a
dummy remote instead.

Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodatapath-windows, conntrack: Fix conntrack new state
Rui Cao [Tue, 23 Jun 2020 06:46:22 +0000 (06:46 +0000)]
datapath-windows, conntrack: Fix conntrack new state

On windows, if we send a connection setup packet in one direction
twice, it will make the connection to be in established state. The
same issue happened in Linux userspace conntrack module and has
been fixed.

This patch port the following previous fixes to windows datapath to
fix the issue:
a867c010ee9183885ee9d3eb76a0005c075c4d2e
ac23d20fc90da3b1c9b2117d1e22102e99fba006

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Rui Cao <rcao@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agobridge: Fix null dereference on ct_timeout_policy record
Yi-Hung Wei [Fri, 26 Jun 2020 18:21:06 +0000 (11:21 -0700)]
bridge: Fix null dereference on ct_timeout_policy record

Accoridng to vswitch.ovsschema, each CT_Zone record may have
zero or one associcated CT_Timeout_policy.  Thus, this patch
checks if ovsrec_ct_timeout_policy exist before accesses the
record.

VMWare-BZ: 2585825
Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.")
Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agorhel: Fix syntax error when matching version.
William Tu [Mon, 22 Jun 2020 15:54:13 +0000 (08:54 -0700)]
rhel: Fix syntax error when matching version.

Remove the extra 'fi' in the script.

VMware-BZ: #2582834
Fixed: fecb28051b35 ("rhel: Support RHEL 7.8 kernel module rpm build.")
Reported-by: Abhijeet Malawade <amalawade@vmware.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoAUTHORS: Add Sriharsha Basavapatna.
Ilya Maximets [Mon, 22 Jun 2020 12:07:33 +0000 (14:07 +0200)]
AUTHORS: Add Sriharsha Basavapatna.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-dpdk: Support offload of VLAN PUSH/POP actions.
Sriharsha Basavapatna [Fri, 29 May 2020 06:33:05 +0000 (02:33 -0400)]
netdev-offload-dpdk: Support offload of VLAN PUSH/POP actions.

Parse VLAN PUSH/POP OVS datapath actions and add respective RTE actions.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoofproto-dpif.at: Add unit test for lb_output action.
Matteo Croce [Mon, 25 May 2020 18:05:15 +0000 (20:05 +0200)]
ofproto-dpif.at: Add unit test for lb_output action.

Extend the balance-tcp one so it tests lb-output action too.
The test checks that that the option is shown in bond/show,
and that the lb_output action is programmed in the datapath.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agouserspace: Avoid dp_hash recirculation for balance-tcp bond mode.
Vishal Deep Ajmera [Fri, 22 May 2020 08:50:05 +0000 (10:50 +0200)]
userspace: Avoid dp_hash recirculation for balance-tcp bond mode.

Problem:

In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:

1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.

2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.

3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.

Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.

Proposed optimization:

This proposal introduces a new load-balancing output action instead of
recirculation.

Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.

Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.

Instead, we will now generate action as
    'lb_output(<bond id>)'

This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).

Example:
Current scheme:

With 8 UDP flows (with random UDP src port):

  flow-dump from pmd on cpu core: 2
  recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)

  recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1

New scheme:
We can do with a single flow entry (for any number of new flows):

  in_port(7),<...> actions:lb_output(1)

A new CLI has been added to dump datapath bond cache as given below.

 # ovs-appctl dpif-netdev/bond-show [dp]

   Bond cache:
     bond-id 1 :
       bucket 0 - slave 2
       bucket 1 - slave 1
       bucket 2 - slave 2
       bucket 3 - slave 1

Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-tc: Revert tunnel src/dst port masks handling
Roi Dayan [Tue, 16 Jun 2020 13:03:57 +0000 (16:03 +0300)]
netdev-offload-tc: Revert tunnel src/dst port masks handling

The cited commit intended to add tc support for masking tunnel src/dst
ips and ports. It's not possible to do tunnel ports masking with
openflow rules and the default mask for tunnel ports set to 0 in
tnl_wc_init(), unlike tunnel ports default mask which is full mask.
So instead of never passing tunnel ports to tc, revert the changes
to tunnel ports to always pass the tunnel port.
In sw classification is done by the kernel, but for hw we must match
the tunnel dst port.

Fixes: 5f568d049130 ("netdev-offload-tc: Allow to match the IP and port mask of tunnel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agorhel: Support RHEL 7.8 kernel module rpm build.
William Tu [Wed, 17 Jun 2020 15:38:40 +0000 (08:38 -0700)]
rhel: Support RHEL 7.8 kernel module rpm build.

Add support for RHEL7.8 GA release with kernel 3.10.0-1127.

VMware-BZ: #2582834
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodocs: Add note for AF_XDP installation
Yi-Hung Wei [Tue, 9 Jun 2020 17:42:12 +0000 (10:42 -0700)]
docs: Add note for AF_XDP installation

Add notes about some configuration issues when enabling AF_XDP
support.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoofproto-dpif-trace: Improve NAT tracing.
Dumitru Ceara [Fri, 10 Jan 2020 09:34:43 +0000 (10:34 +0100)]
ofproto-dpif-trace: Improve NAT tracing.

When ofproto/trace detects a recirc action it resumes execution at the
specified next table. However, if the ct action performs SNAT/DNAT,
e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
ports in the oftrace_recirc_node->flow field are not updated. This leads
to misleading outputs from ofproto/trace as real packets would actually
first get NATed and might match different flows when recirculated.

Assume the first IP/port from the NAT src/dst action will be used by
conntrack for the translation and update the oftrace_recirc_node->flow
accordingly. This is not entirely correct as conntrack might choose a
different IP/port but the result is more realistic than before.

This fix covers new connections. However, for reply traffic that executes
actions of the form ct(nat, table=42) we still don't update the flow as
we don't have any information about conntrack state when tracing.

Also move the oftrace_recirc_node processing out of ofproto_trace()
and to its own function, ofproto_trace_recirc_node() for better
readability/

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoodp-util.c: Fix dp_hash execution with slowpath actions.
Han Zhou [Fri, 15 May 2020 07:17:47 +0000 (00:17 -0700)]
odp-util.c: Fix dp_hash execution with slowpath actions.

When dp_hash is executed with slowpath actions, it results in endless
recirc loop in kernel datapath, and finally drops the packet, with
kernel logs:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

The root cause is that the dp_hash value calculated by slowpath is not
passed to datapath when executing the recirc action, thus when the recirced
packet miss upcall comes to userspace again, it generates the dp_hash
and recirc action again, with same recirc_id, which in turn generates
a megaflow with recirc action with the recird_id same as the recirc_id in
its match condition, which causes a loop in datapath.

For example, this can be reproduced with below setup of OVN environment:

                         LS1            LS2
                          |              |
                          |------R1------|
        VIF--LS0---R0-----|              |------R3
                          |------R2------|

Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
routes (ECMP) from R3 to the VIF:
R3 -> R1 -> R0
R3 -> R2 -> R0

Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
will hit the R3's datapath which has flows that responds to the ICMP packet
by setting ICMP fields, which requires slowpath actions, and in later flow
tables it will hit the "group" action that selects between the ECMP routes.

By default OVN uses "dp_hash" method for the "group" action.

For the first miss upcall packet, dp_hash value is empty, so the group action
will be translated to "dp_hash" and "recirc".

During action execution, because of the previous actions that sets ICMP fields,
the whole execution requires slowpath, so it tries to execute all actions in
userspace in odp_execute_actions(), including dp_hash action, except the
recirc action, which can only be executed in datapath. So the dp_hash value
is calculated in userspace, and then the packet is injected to datapath for
recirc action execution.

However, the dp_hash calculated by the userspace is not passed to datapath.

Because of this, the packet recirc in datapath doesn't have dp_hash value,
and the miss upcall for the recirced packet hits the same flow tables and
triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!

This time, the new upcall doesn't require any slowpath execution, so both
the dp_hash and recirc actions are executed in datapath, after creating a
datapath megaflow like:

recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)

with match recirc_id equals the recirc id in the action, thus creating a loop.

This patch fixes the problem by passing the calculated dp_hash value to
datapath in odp_key_from_dp_packet().

Fixes: 572f732ab078 ("dpif-netdev: user space datapath recirculation")
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.
Numan Siddique [Fri, 5 Jun 2020 08:30:29 +0000 (14:00 +0530)]
ovsdb idl: Try committing the pending txn in ovsdb_idl_loop_run.

The function ovsdb_idl_loop_run(), after calling ovsdb_idl_run(),
returns a transaction object (of type 'struct ovsdb_idl_txn').
The returned transaction object can be NULL if there is a pending
transaction (loop->committing_txn) in the idl loop object.

Normally the clients of idl library, first call ovsdb_idl_loop_run(),
then do their own processing and create any idl transactions during
this processing and then finally call ovsdb_idl_loop_commit_and_wait().

If ovsdb_idl_loop_run() returns NULL transaction object, then much
of the processing done by the client gets wasted as in the case
of ovn-controller.

The client (in this case ovn-controller), can skip the processing
and instead call ovsdb_idl_loop_commit_and_wait() if the transaction
oject is NULL. But ovn-controller uses IDL tracking and it may
loose the tracked changes in that run.

This patch tries to improve this scenario, by checking if the
pending transaction can be committed in the ovsdb_idl_loop_run()
itself and if the pending transaction is cleared (because of the
response messages from ovsdb-server due to a transaction message
in the previous run), ovsdb_idl_loop_run() can return a valid
transaction object.

CC: Han Zhou <hzhou@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netlink: Fix Windows incompatibility when setting new feature
Rui Cao [Mon, 15 Jun 2020 06:05:13 +0000 (14:05 +0800)]
dpif-netlink: Fix Windows incompatibility when setting new feature

OVS_DP_ATTR_NAME field is required when sending OVS_DP_CMD_SET to
windows kernel driver. The function "dpif_netlink_set_features"
dose not set the OVS_DP_ATTR_NAME field which will cause set feature
failure and ovs-vswitchd will exist.

This patch fixes the issue by setting "request.name" in request.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/187
Submitted-at: https://github.com/openvswitch/ovs/pull/319
Signed-off-by: Rui Cao <rcao@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.
Dumitru Ceara [Thu, 28 May 2020 12:32:31 +0000 (14:32 +0200)]
ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.

Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
(i.e., "monitor_cond_since" method) with the initial monitor condition
MC1.

Assuming the following two transactions are executed on the
ovsdb-server:
TXN1: "insert record R1 in table T1"
TXN2: "insert record R2 in table T2"

If the client's monitor condition MC1 for table T2 matches R2 then the
client will receive the following update3 message:
method="update3", "insert record R2 in table T2", last-txn-id=TXN2

At this point, if the presence of the new record R2 in the IDL triggers
the client to update its monitor condition to MC2 and add a clause for
table T1 which matches R1, a monitor_cond_change message is sent to the
server:
method="monitor_cond_change", "clauses from MC2"

In normal operation the ovsdb-server will reply with a new update3
message of the form:
method="update3", "insert record R1 in table T1", last-txn-id=TXN2

However, if the connection drops in the meantime, this last update might
get lost.

It might happen that during the reconnect a new transaction happens
that modifies the original record R1:
TXN3: "modify record R1 in table T1"

When the client reconnects, it will try to perform a fast resync by
sending:
method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2

Because TXN2 is still in the ovsdb-server transaction history, the
server replies with the changes from the most recent transactions only,
i.e., TXN3:
result="true", last-txbb-id=TXN3, "modify record R1 in table T1"

This causes the IDL on the client in to end up in an inconsistent
state because it has never seen the update that created R1.

Such a scenario is described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22

To avoid this issue, the IDL will now maintain (up to) 3 different
types of conditions for each DB table:
- new_cond: condition that has been set by the IDL client but has
  not yet been sent to the server through monitor_cond_change.
- req_cond: condition that has been sent to the server but the reply
  acknowledging the change hasn't been received yet.
- ack_cond: condition that has been acknowledged by the server.

Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
disconnect):
- if there is a known last_id txn-id the code ensures that new_cond
  will contain the most recent condition set by the IDL client
  (either req_cond if there was a request in flight, or new_cond
  if the IDL client set a condition while the IDL was disconnected)
- if there is no known last_id txn-id the code ensures that ack_cond will
  contain the most recent conditions set by the IDL client regardless
  whether they were acked by the server or not.

When monitor_cond_since/monitor_cond requests are sent they will
always include ack_cond and if new_cond is not NULL a follow up
monitor_cond_change will be generated afterwards.

On the other hand ovsdb_idl_db_set_condition() will always modify new_cond.

This ensures that updates of type "insert" that happened before the last
transaction known by the IDL but didn't match old monitor conditions are
sent upon reconnect if the monitor condition has changed to include them
in the meantime.

Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb-server.7: Mention update3 as replies to monitor_cond_change.
Dumitru Ceara [Thu, 28 May 2020 12:32:17 +0000 (14:32 +0200)]
ovsdb-server.7: Mention update3 as replies to monitor_cond_change.

Monitor_cond_change might trigger updates to be sent to clients as results
to condition changes. These updates can be either update2 (for monitor_cond
monitors) or update3 (for monitor_cond_since monitors). The documentation
used to mention only update2.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovs-rcu: Avoid flushing callbacks during postponing.
Ilya Maximets [Wed, 3 Jun 2020 14:58:16 +0000 (16:58 +0200)]
ovs-rcu: Avoid flushing callbacks during postponing.

ovsrcu_flush_cbset() call during ovsrcu_postpone() could cause
use after free in case the caller sets new pointer only after
postponing free for the old one:

 ------------------  ------------------  -------------------
 Thread 1            Thread 2            RCU Thread
 ------------------  ------------------  -------------------
 pointer = A

 ovsrcu_quiesce():
  thread->seqno = 30
  global_seqno = 31
  quiesced

 read pointer A
 postpone(free(A)):
   flush cbset
                                         pop flushed_cbsets
                                         ovsrcu_synchronize:
                                           target_seqno = 31
                     ovsrcu_quiesce():
                      thread->seqno = 31
                      global_seqno = 32
                      quiesced

                     read pointer A
                     use pointer A

                     ovsrcu_quiesce():
                      thread->seqno = 32
                      global_seqno = 33
                      quiesced

                     read pointer A
 pointer = B

 ovsrcu_quiesce():
  thread->seqno = 33
  global_seqno = 34
  quiesced

                                         target_seqno exceeded
                                         by all threads
                                         call cbs to free A
                     use pointer A
                     (use after free)
 -----------------------------------------------------------

Fix that by using dynamically re-allocated array without flushing
to the global flushed_cbsets until writer enters quiescent state.

Fixes: 0f2ea84841e1 ("ovs-rcu: New library.")
Reported-by: Linhaifeng <haifeng.lin@huawei.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371265.html
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovs-actions.xml: Fix a typo in the description of check_pkt_larger.
Numan Siddique [Fri, 29 May 2020 09:39:12 +0000 (15:09 +0530)]
ovs-actions.xml: Fix a typo in the description of check_pkt_larger.

Signed-off-by: Numan Siddique <numans@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-offload-tc: Allow installing arp rules to TC dp.
Tonghao Zhang [Fri, 5 Jun 2020 13:17:29 +0000 (21:17 +0800)]
netdev-offload-tc: Allow installing arp rules to TC dp.

This patch allows to install arp rules to tc dp.
In the future, arp will be offloaded to hardware to
be processed. So OvS enable this now.

$ ovs-appctl dpctl/add-flow 'recirc_id(0),in_port(3),eth(),\
  eth_type(0x0806),arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' 2

$ ovs-appctl dpctl/dump-flows
  ... arp(tip=10.255.1.116,op=2,tha=00:50:56:e1:4b:ab) ...

$ tc filter show dev <ethx> ingress
  ...
  eth_type arp
  arp_tip 10.255.1.116
  arp_op reply
  arp_tha 00:50:56:e1:4b:ab
  not_in_hw
    action order 1: mirred (Egress Redirect to device <ethy>) stolen
    ...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotc: Support new terse dump kernel API
Vlad Buslov [Thu, 4 Jun 2020 10:47:01 +0000 (13:47 +0300)]
tc: Support new terse dump kernel API

When dumping flows in terse mode set TCA_DUMP_FLAGS attribute to
TCA_DUMP_FLAGS_TERSE flag to prevent unnecessary copying of data between
kernel and user spaces. Only expect kernel to provide cookie, stats and
flags when dumping filters in terse mode.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload: Implement terse dump support
Vlad Buslov [Thu, 4 Jun 2020 10:47:00 +0000 (13:47 +0300)]
netdev-offload: Implement terse dump support

In order to improve revalidator performance by minimizing unnecessary
copying of data, extend netdev-offloads to support terse dump mode. Extend
netdev_flow_api->flow_dump_create() with 'terse' bool argument. Implement
support for terse dump in functions that convert netlink to flower and
flower to match. Set flow stats "used" value based on difference in number
of flow packets because lastuse timestamp is not included in TC terse dump.

Kernel API support is implemented in following patch.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Expand tunnel source IPs masked match
Tonghao Zhang [Tue, 2 Jun 2020 13:50:25 +0000 (21:50 +0800)]
netdev-offload-tc: Expand tunnel source IPs masked match

To support more use case, for example, DDOS, which
packets should be dropped in hardware, this patch
allows users to match only the tunnel source IPs with
masked value.

$ ovs-appctl dpctl/add-flow "tunnel(src=2.2.2.0/255.255.255.0,tp_dst=4789,ttl=64),\
  recirc_id(2),in_port(3),eth(),eth_type(0x0800),ipv4()" ""

$ ovs-appctl dpctl/dump-flows
  tunnel(src=2.2.2.0/255.255.255.0,ttl=64,tp_dst=4789) ... actions:drop
$ tc filter show dev vxlan_sys_4789 ingress
  ...
  eth_type ipv4
  enc_src_ip 2.2.2.0/24
  enc_dst_port 4789
  enc_ttl 64
  in_hw in_hw_count 2
    action order 1: gact action drop
    ...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Allow to match the IP and port mask of tunnel
Tonghao Zhang [Tue, 2 Jun 2020 13:50:24 +0000 (21:50 +0800)]
netdev-offload-tc: Allow to match the IP and port mask of tunnel

This patch allows users to offload the TC flower rules with
tunnel mask. This patch allows masked match of the following,
where previously supported an exact match was supported:
* Remote (dst) tunnel endpoint address
* Local (src) tunnel endpoint address
* Remote (dst) tunnel endpoint UDP port

And also allows masked match of the following, where previously
no match was supported:
* Local (src) tunnel endpoint UDP port

In some case, mask is useful as wildcards. For example, DDOS,
in that case, we don’t want to allow specified hosts IPs or
only source Ports to access the targeted host. For example:

$ ovs-appctl dpctl/add-flow "tunnel(dst=2.2.2.100,src=2.2.2.0/255.255.255.0,tp_dst=4789),\
  recirc_id(0),in_port(3),eth(),eth_type(0x0800),ipv4()" ""

$ tc filter show dev vxlan_sys_4789 ingress
  ...
  eth_type ipv4
  enc_dst_ip 2.2.2.100
  enc_src_ip 2.2.2.0/24
  enc_dst_port 4789
  enc_ttl 64
  in_hw in_hw_count 2
action order 1: gact action drop
    ...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agonetdev-offload-tc: Use ipv6_addr_is_set instead of is_all_zeros
Tonghao Zhang [Tue, 2 Jun 2020 13:50:23 +0000 (21:50 +0800)]
netdev-offload-tc: Use ipv6_addr_is_set instead of is_all_zeros

Not bugfix, make the codes more readable.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agodpif-netlink: Generate ufids for installing TC flowers
Tonghao Zhang [Tue, 2 Jun 2020 13:50:22 +0000 (21:50 +0800)]
dpif-netlink: Generate ufids for installing TC flowers

To support installing the TC flowers to HW, via "ovs-appctl dpctl/add-flow"
command, there should be an ufid. This patch will check whether ufid exists,
if not, generate an ufid. Should to know that when processing upcall packets,
ufid is generated in parse_odp_packet for kernel datapath.

Configuring the max-idle/max-revalidator, may help testing this patch.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agoovsdb: Fix timeout type for wait operation.
Ilya Maximets [Mon, 25 May 2020 16:21:39 +0000 (18:21 +0200)]
ovsdb: Fix timeout type for wait operation.

According to RFC 7047, 'timeout' is an integer field:

 5.2.6.  Wait
   The "wait" object contains the following members:
      "op": "wait"                        required
      "timeout": <integer>                optional
      ...

For some reason initial implementation treated it as a real number.

This causes a build issue with clang that complains that LLONG_MAX
could not be represented as double:

 ovsdb/execution.c:733:32: error: implicit conversion from 'long long'
                           to 'double' changes value from
                           9223372036854775807 to 9223372036854775808
            timeout_msec = MIN(LLONG_MAX, json_real(timeout));
                           ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 /usr/include/sys/limits.h:69:19: note: expanded from macro 'LLONG_MAX'
 #define LLONG_MAX       __LLONG_MAX     /* max for a long long */
                        ^~~~~~~~~~~
 /usr/include/x86/_limits.h:74:21: note: expanded from macro '__LLONG_MAX'
 #define __LLONG_MAX     0x7fffffffffffffffLL    /* max value for a long long */
                        ^~~~~~~~~~~~~~~~~~~~
 ./lib/util.h:90:21: note: expanded from macro 'MIN'
 #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
                     ^  ~

Fix that by changing parser to treat 'timeout' as integer.
Fixes clang build on FreeBSD 12.1 in CirrusCI.

Fixes: f85f8ebbfac9 ("Initial implementation of OVSDB.")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb-idl: Add function to reset min_index.
Mark Michelson [Fri, 1 May 2020 19:13:08 +0000 (15:13 -0400)]
ovsdb-idl: Add function to reset min_index.

If an administrator removes all of the databases in a cluster from
disk, then ovsdb IDL clients will have a problem. The databases will all
reset their stored indexes to 0, so The IDL client's min_index will be
higher than the indexes of all databases in the cluster. This results in
the client constantly connecting to databases, detecting the data as
"stale", and then attempting to connect to another.

This function provides a way to reset the IDL to an initial state with
min_index of 0. This way, the client will not wrongly detect the
database data as stale and will recover properly.

Notice that this function is not actually used anywhere in this patch.
This will be used by OVN, though, since OVN is the primary user of
clustered OVSDB.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodatapath: Add hash info to upcall.
Han Zhou [Tue, 26 May 2020 05:35:44 +0000 (22:35 -0700)]
datapath: Add hash info to upcall.

This patch backports below upstream patches, and add __skb_set_hash
to compat for older kernels.

commit b5ab1f1be6180a2e975eede18731804b5164a05d
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Mar 2 21:05:18 2020 -0800

    openvswitch: add missing attribute validation for hash

    Add missing attribute validation for OVS_PACKET_ATTR_HASH
    to the netlink policy.

Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit bd1903b7c4596ba6f7677d0dfefd05ba5876707d
Author: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Date:   Wed Nov 13 23:04:49 2019 +0800

    net: openvswitch: add hash info to upcall

    When using the kernel datapath, the upcall don't
    include skb hash info relatived. That will introduce
    some problem, because the hash of skb is important
    in kernel stack. For example, VXLAN module uses
    it to select UDP src port. The tx queue selection
    may also use the hash in stack.

    Hash is computed in different ways. Hash is random
    for a TCP socket, and hash may be computed in hardware,
    or software stack. Recalculation hash is not easy.

    Hash of TCP socket is computed:
    tcp_v4_connect
        -> sk_set_txhash (is random)

    __tcp_transmit_skb
        -> skb_set_hash_from_sk

    There will be one upcall, without information of skb
    hash, to ovs-vswitchd, for the first packet of a TCP
    session. The rest packets will be processed in Open vSwitch
    modules, hash kept. If this tcp session is forward to
    VXLAN module, then the UDP src port of first tcp packet
    is different from rest packets.

    TCP packets may come from the host or dockers, to Open vSwitch.
    To fix it, we store the hash info to upcall, and restore hash
    when packets sent back.

    +---------------+          +-------------------------+
    |   Docker/VMs  |          |     ovs-vswitchd        |
    +----+----------+          +-+--------------------+--+
         |                       ^                    |
         |                       |                    |
         |                       |  upcall            v restore packet hash
(not recalculate)
         |                     +-+--------------------+--+
         |  tap netdev         |                         |   vxlan module
         +--------------->     +-->  Open vSwitch ko     +-->
           or internal type    |                         |
                               +-------------------------+

    Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Aliasgar Ginwala <aginwala@ebay.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoAUTHORS: Add Eiichi Tsukata.
Ilya Maximets [Mon, 25 May 2020 12:09:42 +0000 (14:09 +0200)]
AUTHORS: Add Eiichi Tsukata.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoclassifier: Prevent tries vs n_tries race leading to NULL dereference.
Eiichi Tsukata [Wed, 27 May 2020 02:13:34 +0000 (11:13 +0900)]
classifier: Prevent tries vs n_tries race leading to NULL dereference.

Currently classifier tries and n_tries can be updated not atomically,
there is a race condition which can lead to NULL dereference.
The race can happen when main thread updates a classifier tries and
n_tries in classifier_set_prefix_fields() and at the same time revalidator
or handler thread try to lookup them in classifier_lookup__(). Such race
can be triggered when user changes prefixes of flow_table.

Race(user changes flow_table prefixes: ip_dst,ip_src => none):

  [main thread]             [revalidator/handler thread]
  ===========================================================
                            /* cls->n_tries == 2 */
                            for (int i = 0; i < cls->n_tries; i++) {
  trie_init(cls, i, NULL);
  /* n_tries == 0 */
  cls->n_tries = n_tries;
                            /* cls->tries[i]->feild is NULL */
                            trie_ctx_init(&trie_ctx[i],&cls->tries[i]);
                            /* trie->field is NULL */
                            ctx->be32ofs = trie->field->flow_be32ofs;

To prevent the race, instead of re-introducing internal mutex
implemented in the commit fccd7c092e09 ("classifier: Remove internal
mutex."), this patch makes trie field RCU protected and checks it after
read.

Fixes: fccd7c092e09 ("classifier: Remove internal mutex.")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoraft: Avoid sending equal snapshots.
Ilya Maximets [Fri, 22 May 2020 20:36:27 +0000 (22:36 +0200)]
raft: Avoid sending equal snapshots.

Snapshots are huge.  In some cases we could receive several outdated
append replies from the remote server.  This could happen in high
scale cases if the remote server is overloaded and not able to process
all the raft requests in time.  As an action to each outdated append
reply we're sending full database snapshot.  While remote server is
already overloaded those snapshots will stuck in jsonrpc backlog for
a long time making it grow up to few GB.  Since remote server wasn't
able to timely process incoming messages it will likely not able to
process snapshots leading to the same situation with low chances to
recover.  Remote server will likely stuck in 'candidate' state, other
servers will grow their memory consumption due to growing jsonrpc
backlogs:

jsonrpc|INFO|excessive sending backlog, jsonrpc: ssl:192.16.0.3:6644,
             num of msgs: 3795, backlog: 8838994624.

This patch is trying to avoid that situation by avoiding sending of
equal snapshot install requests.  This helps maintain reasonable memory
consumption and allows the cluster to recover on a larger scale.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb-server: Fix schema leak while reading db.
Ilya Maximets [Thu, 14 May 2020 20:10:45 +0000 (22:10 +0200)]
ovsdb-server: Fix schema leak while reading db.

parse_txn() function doesn't always take ownership of the 'schema'
passed.  So, if the schema of the clustered db has same version as the
one that already in use, parse_txn() will not use it, resulting with a
memory leak:

 7,827 (56 direct, 7,771 indirect) bytes in 1 blocks are definitely lost
    at 0x483BB1A: calloc (vg_replace_malloc.c:762)
    by 0x44AD02: xcalloc (util.c:121)
    by 0x40E70E: ovsdb_schema_create (ovsdb.c:41)
    by 0x40EA6D: ovsdb_schema_from_json (ovsdb.c:217)
    by 0x415EDD: ovsdb_storage_read (storage.c:280)
    by 0x408968: read_db (ovsdb-server.c:607)
    by 0x40733D: main_loop (ovsdb-server.c:227)
    by 0x40733D: main (ovsdb-server.c:469)

While we could put ovsdb_schema_destroy() in a few places inside
'parse_txn()', from the users' point of view it seems better to have a
constant argument and just clone the 'schema' if needed.  The caller
will be responsible for destroying the 'schema' it owns.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agopvector: Document that multiple elements with a given priority are allowed.
Ben Pfaff [Wed, 27 May 2020 19:11:32 +0000 (12:11 -0700)]
pvector: Document that multiple elements with a given priority are allowed.

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb: Add raft memory usage to memory report.
Ilya Maximets [Fri, 22 May 2020 16:31:19 +0000 (18:31 +0200)]
ovsdb: Add raft memory usage to memory report.

Memory reports could be found in logs or by calling 'memory/show'
appctl command.  For ovsdb-server it includes information about db
cells, monitor connections with their backlog size, etc.  But it
doesn't contain any information about memory consumed by raft.
Backlogs of raft connections could be insanely large because of
snapshot installation requests that simply contains the whole database.
In not that healthy clusters where one of ovsdb servers is not able to
timely handle all the incoming raft traffic, backlog on a sender's side
could cause significant memory consumption issues.

Adding new 'raft-connections' and 'raft-backlog' counters to the
memory report to better track such conditions.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agocompat: Backport ipv6_stub change
Greg Rose [Thu, 21 May 2020 21:54:03 +0000 (14:54 -0700)]
compat: Backport ipv6_stub change

A patch backported to the Linux stable 4.14 tree and present in the
latest stable 4.14.181 kernel breaks ipv6_stub usage.

The commit is
8ab8786f78c3 ("net ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup").

Create the compat layer define to check for it and fixup usage in vxlan
and geneve modules.

Passes Travis here:
https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/689798733

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-vport: Fix typo in log message.
Ben Pfaff [Thu, 21 May 2020 21:16:48 +0000 (14:16 -0700)]
netdev-vport: Fix typo in log message.

Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoovsdb-idl: Fix NULL deref reported by Coverity.
William Tu [Fri, 15 May 2020 13:46:55 +0000 (06:46 -0700)]
ovsdb-idl: Fix NULL deref reported by Coverity.

When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref.

An example:
ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
    struct ovsdb_datum datum;
    union ovsdb_atom key;

    datum.n = 1;
    datum.keys = &key;

    key.integer = cur_cfg;
//  1. assign_zero: Assigning: datum.values = NULL.
    datum.values = NULL;
//  CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
//  2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
//     which dereferences null datum.values.
    ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}

And with the following calls:
ovsdb_idl_txn_write_clone
  ovsdb_idl_txn_write__
    6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
       datum->values
    ovsdb_datum_destroy

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovs-bugtool: Add -m option to dump-flows.
William Tu [Thu, 14 May 2020 14:02:43 +0000 (07:02 -0700)]
ovs-bugtool: Add -m option to dump-flows.

This patch adds 'ovs-appctl dpctl/dump-flows -m' to bugtool,
the output will include wildcarded fields and the miniflow bits,
such as 'dp-extra-info:miniflow_bits(4,1)'.

Cc: Emma Finn <emma.finn@intel.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoDocumentation: Fix kernel support matrix
Greg Rose [Tue, 19 May 2020 22:01:46 +0000 (15:01 -0700)]
Documentation: Fix kernel support matrix

The documentation matrix for OVS branches and which kernels they support
is out of date.  Update it to show that since 2.10 the lowest kernel
that we test and support is Linux 3.16.

RHEL and CentOS kernels based upon the original 3.10 kernel are still
supported.

Reported-by: Han Zhou <hzhou@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-May/370742.html
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-offload-tc: Re-fetch block ID after probing.
Aaron Conole [Fri, 15 May 2020 20:36:19 +0000 (16:36 -0400)]
netdev-offload-tc: Re-fetch block ID after probing.

It's possible that block_id could changes after the probe for block
support.  Therefore, fetch the block_id again after the probe.

Fixes: edc2055a2bf7 ("netdev-offload-tc: Flush rules on ingress block when init tc flow api")
Cc: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Co-authored-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-linux: Update LAG in all cases.
Aaron Conole [Fri, 15 May 2020 20:36:18 +0000 (16:36 -0400)]
netdev-linux: Update LAG in all cases.

In some cases, when processing a netlink change event, it's possible for
an alternate part of OvS (like the IPv6 endpoint processing) to hold an
active netdev interface.  This creates a race-condition, where sometimes
the OvS change processing will take the normal path.  This doesn't work
because the netdev device object won't actually be enslaved to the
ovs-system (for instance, a linux bond) and ingress qdisc entries will
be missing.

To address this, we update the LAG information in ALL cases where
LAG information could come in.

Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
Cc: Marcelo Leitner <mleitner@redhat.com>
Cc: John Hurley <john.hurley@netronome.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodebian: Add python3-sphinx to ovs build dependencies
Ansis Atteka [Fri, 15 May 2020 19:08:13 +0000 (12:08 -0700)]
debian: Add python3-sphinx to ovs build dependencies

python3-sphinx has become mandatory build dependency since patch
39b5e46 ("Documentation: Convert multiple manpages to ReST."), because,
otherwise, without this dependency installed, packaging of OVS debian
packages fails with an error that generated man pages can't be found.

Fixes: 39b5e46312 ("Documentation: Convert multiple manpages to ReST.")
CC: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ansis Atteka <aatteka@ovn.org>
Reported-by: Artem Teleshev <artem.teleshev@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
4 years agoofproto: Fix statistics of removed flow.
Ilya Maximets [Thu, 14 May 2020 18:20:56 +0000 (20:20 +0200)]
ofproto: Fix statistics of removed flow.

'fr' is a new variable on the stack.  '+=' here adds the real statistics
to a random stack memory.

Fixes: 164413156cf9 ("Add offload packets statistics")
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agodebian: Fix package dependencies
Roi Dayan [Thu, 14 May 2020 13:25:10 +0000 (16:25 +0300)]
debian: Fix package dependencies

In python2 package was python-twisted-conch but it looks like
for python3 it's just python3-twisted.
For zope interface the python3 package name is python3-zope.interface.

Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Ansis Atteka <aatteka@ovn.org>
4 years agooss-fuzz: Fix miniflow_target.c.
William Tu [Tue, 12 May 2020 15:22:31 +0000 (08:22 -0700)]
oss-fuzz: Fix miniflow_target.c.

Clang reports:
tests/oss-fuzz/miniflow_target.c:209:26: error: suggest braces around \
initialization of subobject
      [-Werror,-Wmissing-braces]
          struct flow flow2 = {0};

Fix it by using memset.

Cc: Bhargava Shastry <bshastry@sect.tu-berlin.de>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agometaflow: Fix maskable conntrack orig tuple fields
Yi-Hung Wei [Wed, 13 May 2020 20:11:17 +0000 (13:11 -0700)]
metaflow: Fix maskable conntrack orig tuple fields

From man ovs-fields(7), the conntrack origin tuple fields
ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed
to be bitwise maskable, but they are not.  This patch enables
those fields to be maskable, and adds a regression test.

Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Reported-by: Wenying Dong <wenyingd@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests: Add tests using tap device.
William Tu [Tue, 24 Mar 2020 22:10:51 +0000 (15:10 -0700)]
tests: Add tests using tap device.

Similar to using veth across namespaces, this patch creates
tap devices, assigns to namespaces, and allows traffic to
go through different test cases.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agouserspace: Enable TSO support for non-DPDK.
William Tu [Tue, 24 Mar 2020 22:10:50 +0000 (15:10 -0700)]
userspace: Enable TSO support for non-DPDK.

This patch enables TSO support for non-DPDK use cases, and
also add check-system-tso testsuite. Before TSO, we have to
disable checksum offload, allowing the kernel to calculate the
TCP/UDP packet checsum. With TSO, we can skip the checksum
validation by enabling checksum offload, and with large packet
size, we see better performance.

Consider container to container use cases:
  iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
And I got around 6Gbps, similar to TSO with DPDK-enabled.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodebian: Fix broken build after some man pages became generated from RST
Ansis Atteka [Wed, 13 May 2020 17:44:11 +0000 (10:44 -0700)]
debian: Fix broken build after some man pages became generated from RST

As far as I know, the official way to build debian packages is by invoking
following command:

> fakeroot debian/rules binary

However, that command started to fail with these errors:

dh_installman --language=C
dh_installman: Cannot find (any matches for) "utilities/ovs-appctl.8" (tried in .)
dh_installman: Cannot find (any matches for) "utilities/ovs-l3ping.8" (tried in .)
dh_installman: Cannot find (any matches for) "utilities/ovs-tcpdump.8" (tried in .)

because the generated manpages are not part of the source tree anymore.  This
patch updates debian *.manpages files to point to the generted files.

Fixes: 39b5e46312 ("Documentation: Convert multiple manpages to ReST.")
CC: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ansis Atteka <aatteka@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agoRAFT: Add clarifying note for cluster/leave operation.
Mark Michelson [Fri, 8 May 2020 21:00:27 +0000 (17:00 -0400)]
RAFT: Add clarifying note for cluster/leave operation.

We had a user express confusion about the state of a cluster after using
cluster/leave. The user had a three server cluster and used
cluster/leave to remove two servers from the cluster. The user expected
that the single server left would not function since the quorum of two
servers for a three server cluster was not met.

In actuality, cluster/leave removes the server from the cluster and
alters the cluster size in the process. Thus the single remaining server
continued to function since quorum was reached.

This documentation change makes it a bit more explicit that
cluster/leave alters the size of the cluster and cites the three server
down to one server case as an example.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1798158
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
4 years agoraft: Disable RAFT jsonrpc inactivity probe.
Zhen Wang [Tue, 31 Mar 2020 00:21:04 +0000 (17:21 -0700)]
raft: Disable RAFT jsonrpc inactivity probe.

With the scale test of 640 nodes k8s cluster, raft DB nodes' jsonrpc
session got closed due to the timeout of default 5 seconds probe.
It will cause disturbance of the raft cluster. Since we already have
the heartbeat for RAFT, just disable the probe between the servers
to avoid the unnecessary jsonrpc inactivity probe.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Zhen Wang <zhewang@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb-idl: Fix NULL deref reported by Coverity.
William Tu [Sat, 2 May 2020 16:24:30 +0000 (09:24 -0700)]
ovsdb-idl: Fix NULL deref reported by Coverity.

When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref.  An example is below:

ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
    struct ovsdb_datum datum;
    union ovsdb_atom key;

    datum.n = 1;
    datum.keys = &key;

    key.integer = cur_cfg;
//  1. assign_zero: Assigning: datum.values = NULL.
    datum.values = NULL;
//  CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
//  2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
//     which dereferences null datum.values.
    ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}

And with the following calls:
ovsdb_idl_txn_write_clone
  ovsdb_idl_txn_write__
    6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
       datum->values
ovsdb_datum_destroy

And another possible NULL deref is at ovsdb_datum_equals(). Fix the
two by adding additional checks.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovsdb-idlc: Fix memory leak reported by Coverity.
William Tu [Sat, 2 May 2020 16:08:26 +0000 (09:08 -0700)]
ovsdb-idlc: Fix memory leak reported by Coverity.

An exmplae pattern shown below:
void
ovsrec_ct_zone_index_set_external_ids(const struct ovsrec_ct_zone...
{
//  1. alloc_fn: Storage is returned from allocation function xmalloc.
//  2. var_assign: Assigning: datum = storage returned from xmalloc(24UL).
    struct ovsdb_datum *datum = xmalloc(sizeof(struct ovsdb_datum));

//  3. Condition external_ids, taking false branch.
    if (external_ids) {
...
    } else {
//  4. noescape: Resource datum is not freed or pointed-to in ovsdb_datum_init_empty.
        ovsdb_datum_init_empty(datum);
    }
//  5. noescape: Resource datum is not freed or pointed-to in ovsdb_idl_index_write.
    ovsdb_idl_index_write(CONST_CAST(struct ovsdb_idl_row *, &row->header_),
                          &ovsrec_ct_zone_columns[OVSREC_CT_ZONE_COL_EXTERNAL_IDS],
                          datum,
                          &ovsrec_table_classes[OVSREC_TABLE_CT_ZONE]);

// CID 1420856 (#1 of 1): Resource leak (RESOURCE_LEAK)
// 6. leaked_storage: Variable datum going out of scope leaks the storage it
      points to.
Fix it by freeing the datum.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovsdb-idlc: Fix memory leak reported by Coverity.
William Tu [Sat, 2 May 2020 16:01:48 +0000 (09:01 -0700)]
ovsdb-idlc: Fix memory leak reported by Coverity.

Coverity shows the following memory leak in this code pattern:

void
ovsrec_ipfix_index_set_obs_domain_id(...
{
    struct ovsdb_datum datum;
//     1. alloc_fn: Storage is returned from allocation function xmalloc.
//     2. var_assign: Assigning: key = storage returned from xmalloc(16UL).
    union ovsdb_atom *key = xmalloc(sizeof(union ovsdb_atom));

//     3. Condition n_obs_domain_id, taking false branch.
    if (n_obs_domain_id) {
        datum.n = 1;
        datum.keys = key;
        key->integer = *obs_domain_id;
    } else {
        datum.n = 0;
        datum.keys = NULL;
    }
    datum.values = NULL;
    ovsdb_idl_index_write(CONST_CAST(struct ovsdb_idl_row *, &row->head...
//     CID 1420891 (#1 of 1): Resource leak (RESOURCE_LEAK)

Fixed it by moving the xmalloc to the true branch.

Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agosparse: Fix typo in DPDK endian conversion macros.
David Marchand [Tue, 28 Apr 2020 12:03:53 +0000 (14:03 +0200)]
sparse: Fix typo in DPDK endian conversion macros.

This header is duplicated from the DPDK generic header.
Fix typo identified in DPDK [1].

While at it, RTE_EXEC_ENV_BSDAPP has been replaced with
RTE_EXEC_ENV_FREEBSD in 19.05 [2].

1: https://git.dpdk.org/dpdk/commit/?id=a3e283ed904c
2: https://git.dpdk.org/dpdk/commit/?id=5fbc1d498f54

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
4 years agoraft: Fix leak of the incomplete command.
Ilya Maximets [Mon, 4 May 2020 19:55:41 +0000 (21:55 +0200)]
raft: Fix leak of the incomplete command.

Function raft_command_initiate() returns correctly referenced command
instance.  'n_ref' equals 1 for complete commands and 2 for incomplete
commands because one more reference is in raft->commands list.
raft_handle_execute_command_request__() leaks the reference by not
returning pointer anywhere and not unreferencing incomplete commands.

 792 bytes in 11 blocks are definitely lost in loss record 258 of 262
    at 0x483BB1A: calloc (vg_replace_malloc.c:762)
    by 0x44BA32: xcalloc (util.c:121)
    by 0x422E5F: raft_command_create_incomplete (raft.c:2038)
    by 0x422E5F: raft_command_initiate (raft.c:2061)
    by 0x428651: raft_handle_execute_command_request__ (raft.c:4161)
    by 0x428651: raft_handle_execute_command_request (raft.c:4177)
    by 0x428651: raft_handle_rpc (raft.c:4230)
    by 0x428651: raft_conn_run (raft.c:1445)
    by 0x428DEA: raft_run (raft.c:1803)
    by 0x407392: main_loop (ovsdb-server.c:226)
    by 0x407392: main (ovsdb-server.c:469)

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-afxdp: Fix missing init.
William Tu [Mon, 4 May 2020 16:26:24 +0000 (09:26 -0700)]
netdev-afxdp: Fix missing init.

When introducing the interrupt mode for netdev-afxdp, the netdev
init function is accidentally removed.  Fix it by adding it back.

Fixes: 5bfc519fee499 ("netdev-afxdp: Add interrupt mode netdev class.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodocs: Document check_pkt_len action.
William Tu [Fri, 1 May 2020 15:42:25 +0000 (08:42 -0700)]
docs: Document check_pkt_len action.

Cc: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Numan Siddique <numans@ovn.org>
4 years agouserspace: Add conntrack timeout policy support.
William Tu [Wed, 29 Apr 2020 19:25:11 +0000 (12:25 -0700)]
userspace: Add conntrack timeout policy support.

Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
policy support") adds conntrack timeout policy for kernel datapath.
This patch enables support for the userspace datapath.  I tested
using the 'make check-system-userspace' which checks the timeout
policies for ICMP and UDP cases.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agocompat: Fix ipv6_dst_lookup build error
Yi-Hung Wei [Wed, 29 Apr 2020 21:25:50 +0000 (14:25 -0700)]
compat: Fix ipv6_dst_lookup build error

The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is
recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree.

This causes travis build failure:
    * https://travis-ci.org/github/openvswitch/ovs/builds/681084038

This patch updates the backport logic to invoke the right function.

Related patch in
    git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of
               ip6_dst_lookup")

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoAUTHORS: Add Jiang Lidong.
William Tu [Thu, 30 Apr 2020 14:40:36 +0000 (07:40 -0700)]
AUTHORS: Add Jiang Lidong.

Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-linux: remove sum of vport stats and kernel netdev stats.
Jiang Lidong [Thu, 23 Apr 2020 05:35:14 +0000 (05:35 +0000)]
netdev-linux: remove sum of vport stats and kernel netdev stats.

When using kernel veth as OVS interface, doubled drop counter
value is shown when veth drops packets due to traffic overrun.

In netdev_linux_get_stats, it reads both vport stats and kernel
netdev stats, in case vport stats retrieve failure. If both of
them success, error counters are added to include errors from
different layers. But implementation of ovs_vport_get_stats in
kernel data path has included kernel netdev stats by calling
dev_get_stats. When drop or other error counters is not zero,
its value is doubled by netdev_linux_get_stats.

In this change, adding kernel netdev stats into vport stats
is removed, since vport stats includes all information of
kernel netdev stats.

Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovs-bugtool: Add ethtool -l for combined channel.
William Tu [Wed, 29 Apr 2020 17:30:26 +0000 (10:30 -0700)]
ovs-bugtool: Add ethtool -l for combined channel.

Users of netdev-afxdp has to setup the combined channel
on physical NIC. This helps debugging related issues.
Example output:
  $ ethtool -l enp3s0f0
  Channel parameters for enp3s0f0:
  Pre-set maximums:
  RX:        0
  TX:        0
  Other:     1
  Combined:  63
  Current hardware settings:
  RX:        0
  TX:        0
  Other:     1
  Combined:  1

Some previous discussion:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366631.html

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agobugtool: Add dump-tlv-map.
William Tu [Wed, 29 Apr 2020 16:14:54 +0000 (09:14 -0700)]
bugtool: Add dump-tlv-map.

This helps debugging the tlv map issues.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agoofp-actions: Add delete field action
Yi-Hung Wei [Tue, 14 Apr 2020 20:33:28 +0000 (13:33 -0700)]
ofp-actions: Add delete field action

This patch adds a new OpenFlow action, delete field, to delete a
field in packets.  Currently, only the tun_metadata fields are
supported.

One use case to add this action is to support multiple versions
of geneve tunnel metadatas to be exchanged among different versions
of networks.  For example, we may introduce tun_metadata2 to
replace old tun_metadata1, but still want to provide backward
compatibility to the older release.  In this case, in the new
OpenFlow pipeline, we would like to support the case to receive a
packet with tun_metadata1, do some processing.  And if the packet
is going to a switch in the newer release, we would like to delete
the value in tun_metadata1 and set a value into tun_metadata2.

Currently, ovs does not provide an action to remove a value in
tun_metadata if the value is present.  This patch fulfills the gap
by adding the delete_field action.  For example, the OpenFlow
syntax to delete tun_metadata1 is:

    actions=delete_field:tun_metadata1

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
4 years agoAUTHORS: Add Anton Ivanov.
William Tu [Mon, 27 Apr 2020 15:49:13 +0000 (08:49 -0700)]
AUTHORS: Add Anton Ivanov.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Numan Siddique <numans@ovn.org>
4 years agoconntrack: Fix icmp conntrack state.
William Tu [Mon, 27 Apr 2020 15:42:29 +0000 (08:42 -0700)]
conntrack: Fix icmp conntrack state.

ICMP conntrack state should be ICMPS_REPLY after seeing both
side of ICMP traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agodocs: Fix GTP-U release version.
William Tu [Mon, 27 Apr 2020 15:45:19 +0000 (08:45 -0700)]
docs: Fix GTP-U release version.

GTP-U support should be at OVS-2.14.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
4 years agonetdev-afxdp: Add interrupt mode netdev class.
William Tu [Tue, 14 Apr 2020 13:22:55 +0000 (06:22 -0700)]
netdev-afxdp: Add interrupt mode netdev class.

The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp
interrupt mode. This is similar to 'type=afxdp', except that the
is_pmd field is set to false. As a result, the packet processing
is handled by main thread, not pmd thread. This avoids burning
the CPU to always 100% when there is no traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoovsdb: Remove duplicated function defintions
Yi-Hung Wei [Tue, 21 Apr 2020 22:09:05 +0000 (15:09 -0700)]
ovsdb: Remove duplicated function defintions

ovsdb_function_from_string() and ovsdb_function_to_string() are defined
both in ovsdb/condition.c and lib/ovsdb-condidtion.c with the same function
definition.  Remove the one in ovsdb/condition.c to avoid duplication.

This also resolves the following bazel building error.

./libopenvswitch.lo(ovsdb-condition.pic.o): In function `ovsdb_function_from_string':
/lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_from_string'
./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:34: first defined here
./libopenvswitch.lo(ovsdb-condition.pic.o): In function `ovsdb_function_from_string':
./lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_to_string'
./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:335

Reported-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovsdb: Switch ovsdb log fsync to data only.
Anton Ivanov [Tue, 21 Apr 2020 08:23:57 +0000 (09:23 +0100)]
ovsdb: Switch ovsdb log fsync to data only.

We do not check metadata - mtime, atime, anywhere, so we
do not need to update it every time we sync the log.
if the system supports it, the log update should be
data only

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agovlog: Fast path in vlog.
Anton Ivanov [Tue, 21 Apr 2020 08:24:38 +0000 (09:24 +0100)]
vlog: Fast path in vlog.

Avoid grabbing any mutexes if the log levels specify that
no logging is to take place.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoUtilities: Add the ovs_dump_ofpacts command to gdb
Eelco Chaudron [Fri, 17 Apr 2020 12:51:34 +0000 (14:51 +0200)]
Utilities: Add the ovs_dump_ofpacts command to gdb

This adds the ovs_dump_ifpacts command:

(gdb) help ovs_dump_ofpacts
Dump all actions in an ofpacts set
    Usage: ovs_dump_ofpacts <struct ofpact *> <ofpacts_len>

       <struct ofpact *> : Pointer to set of ofpact structures.
       <ofpacts_len>     : Total length of the set.

    Example dumping all actions when in the clone_xlate_actions() function:

    (gdb) ovs_dump_ofpacts actions actions_len
    (struct ofpact *) 0x561c7be487c8: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be487e0: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be487f8: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be48810: {type = OFPACT_SET_FIELD, raw = 255 '', len = 32}
    (struct ofpact *) 0x561c7be48830: {type = OFPACT_SET_FIELD, raw = 255 '', len = 24}
    (struct ofpact *) 0x561c7be48848: {type = OFPACT_RESUBMIT, raw = 38 '&', len = 16}

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoUtilities: make print() in gdb script work on all version of Python
Eelco Chaudron [Tue, 21 Apr 2020 11:33:55 +0000 (13:33 +0200)]
Utilities: make print() in gdb script work on all version of Python

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agofatal-signal: Remove snprintf.
William Tu [Tue, 14 Apr 2020 15:17:04 +0000 (08:17 -0700)]
fatal-signal: Remove snprintf.

Function snprintf is not async-signal-safe.  Replace it with
our own implementation.  Example ovs-vswitchd.log output:
  2020-03-25T01:08:19.673Z|00050|memory|INFO|handlers:2 ports:3
  SIGSEGV detected, backtrace:
  0x4872d9         <fatal_signal_handler+0x49>
  0x7f4e2ab974b0   <killpg+0x40>
  0x7f4e2ac5d74d   <__poll+0x2d>
  0x531098         <time_poll+0x108>
  0x51aefc         <poll_block+0x8c>
  0x445ca9         <udpif_revalidator+0x289>
  0x5056fd         <ovsthread_wrapper+0x7d>
  0x7f4e2b65f6ba   <start_thread+0xca>
  0x7f4e2ac6941d   <clone+0x6d>
  0x0              <+0x0>

Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/674901331
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoconntrack: Add coverage count for l4csum error.
William Tu [Thu, 16 Apr 2020 19:54:53 +0000 (12:54 -0700)]
conntrack: Add coverage count for l4csum error.

Add a coverage counter when userspace conntrack receives a packet
with invalid l4 checksum.  When using veth for testing, users
often forget to turn off the tx offload on the other side of the
namespace, causing l4 checksum not calculated in packet header,
and at conntrack, return invalid conntrack state.

Suggested-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
4 years agoacinclude: handle dependencies for DPDK's AF_XDP PMD
Ciara Loftus [Mon, 10 Feb 2020 13:48:54 +0000 (13:48 +0000)]
acinclude: handle dependencies for DPDK's AF_XDP PMD

If RTE_LIBRTE_AF_XDP is enabled in the DPDK build, OVS must link
the libbpf library, otherwise build failures will occur.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoacinclude: handle dependencies for DPDK's PCAP PMD
Ciara Loftus [Mon, 10 Feb 2020 13:48:53 +0000 (13:48 +0000)]
acinclude: handle dependencies for DPDK's PCAP PMD

If RTE_LIBRTE_PMD_PCAP is enabled in the DPDK build, OVS must link
the pcap library, otherwise build failures will occur.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocompat: Fix broken partial backport of extack op parameter
Greg Rose [Tue, 14 Apr 2020 18:42:10 +0000 (11:42 -0700)]
compat: Fix broken partial backport of extack op parameter

A series of commits added support for the extended ack
parameter to the newlink, changelink and validate ops in
the rtnl_link_ops structure:
a8b8a889e369d ("net: add netlink_ext_ack argument to rtnl_link_ops.validate")
7a3f4a185169b ("net: add netlink_ext_ack argument to rtnl_link_ops.newlink")
ad744b223c521 ("net: add netlink_ext_ack argument to rtnl_link_ops.changelink")

These commits were all added at the same time and present since the
Linux kernel 4.13 release. In our compatiblity layer we have a
define HAVE_EXT_ACK_IN_RTNL_LINKOPS that indicates the presence of
the extended ack parameter for these three link operations.

At least one distro has only backported two of the three patches,
for newlink and changelink, while not backporting patch a8b8a889e369d
for the validate op.  Our compatibility layer code in acinclude.m4
is able to find the presence of the extack within the rtnl_link_ops
structure so it defines HAVE_EXT_ACK_IN_RTNL_LINKOPS but since the
validate link op does not have the extack parameter the compilation
fails on recent kernels for that particular distro. Other kernel
distributions based upon this distro will presumably also encounter
the compile errors.

Introduce a new function in acinclude.m4 that will find function
op definitions and then search for the required parameter.  Then
use this function to define HAVE_RTNLOP_VALIDATE_WITH_EXTACK so
that we can detect and enable correct compilation on kernels
which have not backported the entire set of patches.  This function
is generic to any function op - it need not be in a structure.

In places where HAVE_EXT_ACK_IN_RTNL_LINKOPS wraps validate functions
replace it with the new HAVE_RTNLOP_VALIDATE_WITH_EXTACK define.

Passes Travis here:
https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/674599698

Passes a kernel check-kmod test on several systems, including
sles12 sp4 4.12.14-95.48-default kernel, without any regressions.

VMWare-BZ: #2544032
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoofp-actions: Fix memory leak on error path.
William Tu [Mon, 13 Apr 2020 15:36:44 +0000 (08:36 -0700)]
ofp-actions: Fix memory leak on error path.

Need to free the memory before return. Detected by gcc10.

Signed-off-by: William Tu <u9012063@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
4 years agosystem-traffic: Check frozen state handling with TLV map change
Yifeng Sun [Thu, 9 Apr 2020 18:37:39 +0000 (11:37 -0700)]
system-traffic: Check frozen state handling with TLV map change

This patch enhances a system traffic test to prevent regression on
the tunnel metadata table (tun_table) handling with frozen state.
Without a proper fix this test can crash ovs-vswitchd due to a
use-after-free bug on tun_table.

These are the timed sequence of how this bug is triggered:

- Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that
contains a controller action.
- When the first packet matches the aforementioned OpenFlow rule,
during the miss upcall, OVS stores a pointer to the tun_table (that
decodes the Geneve tunnel metadata) in a frozen state and pushes down
a datapath flow into kernel datapath.
- Issues a add-tlv-map command to reprogram the tun_table on OVS.
OVS frees the old tun_table and create a new tun_table.
- A subsequent packet hits the kernel datapath flow again. Since
there is a controller action associated with that flow, it triggers
slow path controller upcall.
- In the slow path controller upcall, OVS derives the tun_table
from the frozen state, which points to the old tun_table that is
already being freed at this time point.
- In order to access the tunnel metadata, OVS uses the invalid
pointer that points to the old tun_table and triggers the core dump.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotun_metadata: Fix coredump caused by use-after-free bug
Yifeng Sun [Thu, 9 Apr 2020 18:37:38 +0000 (11:37 -0700)]
tun_metadata: Fix coredump caused by use-after-free bug

Tun_metadata can be referened by flow and frozen_state at the same
time. When ovs-vswitchd handles TLV table mod message, the involved
tun_metadata gets freed. The call trace to free tun_metadata is
shown as below:

ofproto_run
- handle_openflow
  - handle_single_part_openflow
    - handle_tlv_table_mod
      - tun_metadata_table_mod
        - tun_metadata_postpone_free

Unfortunately, this tun_metadata can be still used by some frozen_state,
and later on when frozen_state tries to access its tun_metadata table,
ovs-vswitchd crashes. The call trace to access tun_metadata from
frozen_state is shown as below:

udpif_upcall_handler
- recv_upcalls
  - process_upcall
    - frozen_metadata_to_flow

It is unsafe for frozen_state to reference tun_table because tun_table
is protected by RCU while the lifecycle of frozen_state can span several
RCU quiesce states. Current code violates OVS's RCU protection mechanism.

This patch fixes it by simply stopping frozen_state from referencing
tun_table. If frozen_state needs tun_table, the latest valid tun_table
can be found through ofproto_get_tun_tab() efficiently.

A previous commit seems fixing the samiliar issue:
254878c18874f6 (ofproto-dpif-xlate: Fix segmentation fault caused by tun_table)

VMware-BZ: #2526222
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agobugtool: Fix for Python3.
Timothy Redaelli [Thu, 19 Mar 2020 19:05:39 +0000 (20:05 +0100)]
bugtool: Fix for Python3.

Currently ovs-bugtool tool doesn't start on Python 3.
This commit fixes ovs-bugtool to make it works on Python 3.

Replaced StringIO.StringIO with io.BytesIO since the script is
processing binary data.

Reported-at: https://bugzilla.redhat.com/1809241
Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Co-authored-by: William Tu <u9012063@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agodpif-netdev: includes microsecond delta in meter bucket calculation
Jiang Lidong [Tue, 7 Apr 2020 03:28:06 +0000 (03:28 +0000)]
dpif-netdev: includes microsecond delta in meter bucket calculation

When dp-netdev meter rate is higher than 200Mbps, observe
more than 10% bias from configured rate value with UDP traffic.

In dp-netdev meter, millisecond delta between now and last used
is taken into bucket size calcualtion, while sub-millisecond part
is truncated.

If traffic rate is pretty high, time delta can be few milliseconds,
its ratio to truncated part is less than 10:1, the loss of bucket
size caused by truncated can be observed obviously by commited
traffic rate.

In this patch, microsend delta part is included in calculation
of meter bucket to make it more precise.

Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoTravis: Enable clang compiler and unit test for arm CI
Lance Yang [Mon, 30 Mar 2020 12:54:03 +0000 (20:54 +0800)]
Travis: Enable clang compiler and unit test for arm CI

Enable testsuite and clang compiler for arm CI. In order not to increase
the CI jobs, selectively enable them in the existing jobs instead of
adding extra jobs.

Successful travis job build report:
https://travis-ci.org/github/yzyuestc/ovs/builds/667539360

Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com>
Reviewed-by: Malvika Gupta <Malvika.Gupta@arm.com>
Signed-off-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests/testsuite: Skip failing UT cases on aarch64
Malvika Gupta [Mon, 30 Mar 2020 12:54:02 +0000 (20:54 +0800)]
tests/testsuite: Skip failing UT cases on aarch64

The following test cases are failing inconsistently on aarch64 platforms and
have been skipped until further investigation can be made on how to fix them:

20: bfd.at:268           bfd - bfd decay
2104: ovsdb-idl.at:1815  Check Python IDL connects to leader - Python3 (leader only)
2105: ovsdb-idl.at:1816  Check Python IDL reconnects to leader - Python3 (leader only)

Suggested-by: Yanqin Wei <Yanqin.Wei@arm.com>
Suggested-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agotests/atlocal.in: Add check for aarch64 Architecture
Malvika Gupta [Mon, 30 Mar 2020 12:54:01 +0000 (20:54 +0800)]
tests/atlocal.in: Add check for aarch64 Architecture

This patch adds a condition to check if the CPU architecture is aarch64. If the
condition evaluates to true, $IS_ARM64 variable is set to 'yes'. For all other
architectures, this variable is set to 'no'.

Reviewed-by: Yanqin Wei <Yanqin.wei@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoutil: Update OVS_TYPEOF macro for C++ enabled applications.
Archana Holla [Tue, 7 Apr 2020 18:09:33 +0000 (11:09 -0700)]
util: Update OVS_TYPEOF macro for C++ enabled applications.

OVS_TYPEOF macro doesn’t return the type of object for non __GNUC__ platforms.
Updating it to use "decltype" keyword when used from C++ code.

Signed-off-by: Archana Holla <harchana@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agoovs-vswitchd: Fix icmp reply timeout description.
William Tu [Mon, 6 Apr 2020 23:59:01 +0000 (16:59 -0700)]
ovs-vswitchd: Fix icmp reply timeout description.

Currently the userspace datapath implements conntrack ICMP reply state
as when ICMP packets have been seen on both directions.  However, the
description is defined as timeout of the connection after an ICMP error
is replied in response to an ICMP packet.

Fixes: 61a5264d60d0c ("ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.")
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
4 years agonetdev-linux.c: Fix coverity unreachable code warning
Usman Ansari [Wed, 1 Apr 2020 22:33:32 +0000 (15:33 -0700)]
netdev-linux.c: Fix coverity unreachable code warning

Coverity reports unreachable code in "?" statement
Fixed by removing code segment and unused variables & defines

Signed-off-by: Usman Ansari <ua1422@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agocirrus: Force pkg update on FreeBSD.
Ilya Maximets [Fri, 27 Mar 2020 08:51:51 +0000 (09:51 +0100)]
cirrus: Force pkg update on FreeBSD.

Seems like FreeBSD ports/images are not well maintained and frequently
causes package installation failures like this:

 [1/40] Fetching automake-1.16.1_2.txz: .......... done
 pkg: cached package automake-1.16.1_2: size mismatch, fetching from remote
 [2/40] Fetching automake-1.16.1_2.txz: .......... done
 pkg: cached package automake-1.16.1_2: size mismatch, cannot continue
 Consider running 'pkg update -f'

Forced update doesn't increase build time significantly, but helps
to solve at least this one kind of issues.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agoRevert "ovsdb-idl: Avoid sending redundant conditional monitoring updates"
Dumitru Ceara [Wed, 25 Mar 2020 20:15:23 +0000 (21:15 +0100)]
Revert "ovsdb-idl: Avoid sending redundant conditional monitoring updates"

This reverts commit 5351980b047f4dd40be7a59a1e4b910df21eca0a.

If the ovsdb-server reply to "monitor_cond_since" requests has
"found" == false then ovsdb_idl_db_parse_monitor_reply() calls
ovsdb_idl_db_clear() which iterates through all tables and
unconditionally sets table->cond_changed to false.

However, if the client had already set a new condition for some of the
tables, this new condition request will never be sent to ovsdb-server
until the condition is reset to a different value. This is due to the
check in ovsdb_idl_db_set_condition().

One way to replicate the issue is described in the bugzilla reporting
the bug, when ovn-controller is configured to use "ovn-monitor-all":
https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6

Commit 5351980b047f tried to optimize sending redundant conditional
monitoring updates but the chances that this scenario happens with the
latest code is quite low since commit 403a6a0cb003 ("ovsdb-idl: Fast
resync from server when connection reset.") changed the behavior of
ovsdb_idl_db_parse_monitor_reply() to avoid calling ovsdb_idl_db_clear()
in most cases.

Reported-by: Dan Williams <dcbw@redhat.com>
Reported-at: https://bugzilla.redhat.com/1808125
CC: Andy Zhou <azhou@ovn.org>
Fixes: 5351980b047f ("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agouserspace: Add GTP-U support.
William Tu [Mon, 25 Nov 2019 19:19:23 +0000 (11:19 -0800)]
userspace: Add GTP-U support.

GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks.  GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).

This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00

Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
4 years agodpif-netdev: Force port reconfiguration to change dynamic_txqs.
Ilya Maximets [Tue, 24 Mar 2020 23:50:45 +0000 (00:50 +0100)]
dpif-netdev: Force port reconfiguration to change dynamic_txqs.

In case number of polling threads goes from exact number of Tx queues
in port to higher value while set_tx_multiq() not implemented or not
requesting reconfiguration, port will not be reconfigured and datapath
will continue using static Tx queue ids leading to crash.

Ex.:
 Assuming that port p0 supports up to 4 Tx queues and doesn't support
 set_tx_multiq() method.  For example, netdev-afxdp could be the case,
 because it could have multiple Tx queues, but doesn't have
 set_tx_multiq() implementation because number of Tx queues always
 equals to number of Rx queues.

 1. Configuring pmd-cpu-mask to have 3 pmd threads.

 2. Adding port p0 to OVS.
    At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
    Port reconfigured to have 4 Tx queues successfully.
    dynamic_txqs = (4 < 4) = false;

 3. Configuring pmd-cpu-mask to have 10 pmd threads.
    At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
    Since set_tx_multiq() is not implemented, netdev doesn't request
    reconfiguration and 'dynamic_txqs' remains in 'false' state.

 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
    ids that are in range [0, 10] while device only supports 4 leading
    to unwanted behavior and crashes.

Fix that by marking for reconfiguration all the ports that will likely
change their 'dynamic_txqs' value.

It looks like the issue could be reproduced only with afxdp ports,
because all other non-dpdk ports ignores Tx queue ids and dpdk ports
requests for reconfiguration on set_tx_multiq().

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
4 years agonetdev-offload-tc: Flush rules on ingress block when init tc flow api
Dmytro Linkin [Thu, 27 Feb 2020 15:22:32 +0000 (17:22 +0200)]
netdev-offload-tc: Flush rules on ingress block when init tc flow api

OVS can fail to attach ingress block on iface when init tc flow api,
if block already exist with rules on it and is shared with other iface.
Fix by flush all existing rules on the ingress block prior to deleting
it.

Fixes: 093c9458fb02 ("tc: allow offloading of block ids")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Raed Salem <raeds@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
4 years agotravis: Enable OvS Travis CI for arm
Lance Yang [Tue, 24 Mar 2020 07:00:37 +0000 (15:00 +0800)]
travis: Enable OvS Travis CI for arm

Enable part of travis jobs with gcc compiler for arm64 architecture

1. Add arm jobs into the matrix in .travis.yml configuration file
2. To enable OVS-DPDK jobs, set the build target according to
different CPU architectures
3. Temporarily disable sparse checker because of static code checking
failure on arm64

Considering the balance of the CI coverage and running time, some kernel
and DPDK jobs are removed from Arm CI.

Successful travis build jobs report:
https://travis-ci.org/github/yzyuestc/ovs/builds/666129448

Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com>
Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
Reviewed-by: JingZhao Ni <JingZhao.Ni@arm.com>
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Signed-off-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
4 years agovlog: Fix OVS_REQUIRES macro.
William Tu [Tue, 24 Mar 2020 19:10:57 +0000 (12:10 -0700)]
vlog: Fix OVS_REQUIRES macro.

Pass lock objects, not their addresses, to the annotation macros.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>