Numan Siddique [Tue, 20 Mar 2018 11:29:42 +0000 (16:59 +0530)]
ovn-controller: Handle Port_Binding's "requested-chassis" option in physical.c
When a Logical_Switch_Port P's options is set with 'requested-chassis=hv1'
and if the user has bound this logical port to two OVS interfaces each in
different host (eg. hv1 and hv2), then ovn-controller in hv1 sets the
P's Port_Binding.chassis to hv1 which is as expected. But on hv2, ovn-controller
is adding OF flows in table 0 and table 65 for the OVS interface instead of
considering 'P' as a remote port. When another logical port bound on hv2,
pings to the logical port 'P', the packet gets delivered to hv2 OVS interface
instead of hv1 OVS interface, which is wrong.
This scenario is most likely to happen when requested-chassis option is used
by CMS during migration of a VM from one chassis to another.
This patch fixes this issue by checking the Port_Binding's "requested-chassis"
option in physical.c before adding the flows in table 0 an 65.
Reported-by: Marcin Mirecki <mmirecki@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/345266.html Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Tested-by: Marcin Mirecki <mmirecki@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Chris Mi [Tue, 10 Apr 2018 05:18:09 +0000 (14:18 +0900)]
netdev-tc-offloads: Add offloading of multiple outputs
Currently, we support offloading of one output port. Remove that
limitation by use of mirred mirror action for all output ports,
except that the last one is mirred redirect action.
Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Chris Mi [Tue, 10 Apr 2018 05:18:08 +0000 (14:18 +0900)]
tc: Make the actions order consistent
When OVS DP passes the actions to TC library, we save all the
actions in data structure tc_flower and each action type has its
own field in tc_flower. So when TC library passes the actions to
kernel, actually the actions order is lost.
We add an actions array in tc_flower to keep the actions order
in this patch.
Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
The commit 9afc6f14ee7b2622703d98689acb0044d4a5492e added a new author
which name was too long that broke the column size. Runinng "make
docs-checks" was failing because of that.
All this patch does is to enlarge the "Name" column to fit the new
author's name.
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Daniel Alvarez <dalvarez@redhat.com>
Roi Dayan [Wed, 11 Apr 2018 07:57:30 +0000 (10:57 +0300)]
tc: Change filter error to debug once
Also update the message to be more correct.
Before this commit if there were tc rules that are not of type
flower the log was getting filled quickyl with errors about it
and always appeared to the user when dumping flows from user space.
This commit moves the error to debug and logs it only once.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
icmp6 action is used to replace the IPv6 packet been processed with
an ICMPv6 packet initialized based on incoming IPv6 one.
Ethernet and IPv6 fields not listed are not changed:
- ip.proto = 58 (ICMPv6)
- ip.ttl = 255
- icmp6.type = 1 (destination unreachable)
- icmp6.code = 1 (communication administratively prohibited)
Prerequisite: ip6
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Handle gratuitous ARP requests and replies in tnl_arp_snoop()
Problem:
========
In user-space tunneling implementation, tnl_arp_snoop() snoops only ARP
*reply* packets to resolve tunnel nexthop IP addresses to MAC addresses.
Normally the ARP requests are periodically sent by the local host IP stack,
so that the ARP cache in OVS is refreshed and entries do not time out.
However, if the remote tunnel nexthop is a VRRP IP, and the gateway
periodically sends gratuitous ARP *requests* to announce itself,
tnl_arp_snoop() treats them as INVALID. Consequently, the ARP cache in OVS
expires after 10 minutes, which results in dropping of the next packet(s)
until a new ARP request is responded to.
Fix:
====
Enhance the tunnel neighbor resolution logic in OVS to not only snoop on
ARP replies but also on gratuitous ARP requests.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
From: Manohar K C <manohar.krishnappa.chidambaraswamy@ericsson.com> CC: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Mark Michelson [Mon, 9 Apr 2018 17:07:20 +0000 (12:07 -0500)]
stopwatch: Fix Windows incompatibility
Stopwatch was implemented using a Unix-only pipe structure. This commit
changes to using a guarded list and latch in order to pass data between
threads.
Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
json.at: Avoid to run $PYTHON when python is not available
This commit is needed since autotest tries to run AT_XFAIL_IF when
AT_SKIP_IF condition is true too.
Currently HAVE_PYTHON is required, but this may change in the future
since many distributions are migrating to python3 as default python
(like Arch Linux or Fedora) and so it can be a good idea to permit to
build OVS using python3.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
In some environments, builds would fail with the following error:
lib/stopwatch.c: In function ‘stopwatch_exit’:
lib/stopwatch.c:448:5: error: ignoring return value of ‘write’, declared
with attribute warn_unused_result [-Werror=unused-result]
write(stopwatch_pipe[1], &pkt, sizeof pkt);
This patch explicitly ignores the return value of write().
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>
Jan Scheurich [Thu, 5 Apr 2018 14:11:04 +0000 (16:11 +0200)]
nsh: Add unit test for double NSH encap and decap
The added test verifies that OVS correctly encapsulates an Ethernet
packet with two NSH (MD1) headers, sends it with an Ethernet header
over a patch port and decaps the Ethernet and the two NSH headers on
the receiving bridge to reveal the original packet.
The test case performs the encap() operations in a sequence of three
chained groups to test the correct handling of encap() actions in
group buckets recently fixed in commit ce4a16ac0.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Thu, 5 Apr 2018 14:11:03 +0000 (16:11 +0200)]
xlate: Correct handling of double encap() actions
When the same encap() header was pushed twice onto a packet (e.g in the
case of NSH in NSH), the translation logic only generated a datapath push
action for the first encap() action. The second encap() did not emit a
push action because the packet type was unchanged.
commit_encap_decap_action() (renamed from commit_packet_type_change) must
solely rely on ctx->pending_encap to generate an datapath push action.
Similarly, the first decap() action on a double header packet does not
change the packet_type either. Add a corresponding ctx->pending_decap
flag and use that to trigger emitting a datapath pop action.
Fixes: f839892a2 ("OF support and translation of generic encap and decap") Fixes: 1fc11c594 ("Generic encap and decap support for NSH") Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Mon, 12 Mar 2018 06:50:57 +0000 (12:20 +0530)]
Fix possible timing issues in OVN test cases
This patch fixes timing related failures in some test cases when run with
-j option. It uses OVS_WAIT_* whereever appropriate. It also fixes the
test case "ovn -- IPv6 periodic RA" wherein the logical port - sw-p1 is
bound on 2 chassis and this causes both the chassis to fight for the port.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yi-Hung Wei [Wed, 14 Mar 2018 18:16:41 +0000 (11:16 -0700)]
ofproto-dpif-xlate: Report ct fields changes in ofproto/trace
With commit f6fabcc6 ("ofproto-dpif: Mark packets as "untracked" after
call to ct()", after the ct() action, the packet conntrack state is set
to an untracked state, and all the conntrack fields are cleared.
This patch updates ofproto/trace report to reflect this change, so that
it would be easier to debug OpenFlow pipeline with conntrack.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds the options to start clustered OVN db servers in ovn-ctl.
To support this, following options are added - '--db-(nb/sb)-cluster-local-addr',
'--db-(nb/sb)-cluster-local-port', '--db-(nb/sb)-cluster-local-proto',
'--db-(nb/sb)-cluster-remote-addr', '--db-(nb/sb)-cluster-remote-port' and
'--db-(nb/sb)-cluster-remote-proto'.
If only '--db-(nb/sb)-cluster-local-addr' is defined then clustered db is created
(using ovsdb-tool create-cluster). If both are defined, then the db is added to
the cluster (using ovsdb-tool join-cluster)
This patch also adds the support to configure ovn-northd to point to all the servers
in the cluster using the options - '--ovn-northd-nb-db' and 'ovn-northd-sb-db'.
Presently this patch doesn't handle the schema update scenario when restarting the
clustered ovsdb-servers. This will be handled in a separate patch.
The initial versions of these commands are tested by Aliasgar <aginwala <aginwala@asu.edu>
and the discussion on this can be found here -
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046470.html
(There are 4 checkpatch warnings 'Line length is >79-characters long' in ovn-ctl.8.xml
which I couldn't resolve without losing proper rendering when "man ovn-ctl" is run.)
Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
By default, ovs-ofctl can accept and display port
names in place of numbers. ovs-ofctl tool exports
only the option --names, but not --no-names in
command help information.
Fixes: 50f96b10e1c8 ("Support accepting and displaying port names in OVS tools.") Cc: Ben Pfaff <blp@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 3 Apr 2018 17:12:59 +0000 (10:12 -0700)]
ofp-flow: minimatch is initialized twice.
It is possible that 'fm->match' gets initialized twice in this function,
which makes the first one leak because its pointer is overwritten by the
second initialization.
This patch fixes this issue.
Fixes: 6a6b7060655e ("ofp-flow: Reduce memory consumption for ofputil_flow_mod, using minimatch.") Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Tue, 3 Apr 2018 17:12:58 +0000 (10:12 -0700)]
ovsdb-idl: properly destroy ovsdb_idl.server
This patch fixes the memory leak reported by valgrind in testing
"learning action - TCPv6 port learning"
150 (40 direct, 110 indirect) bytes in 1 blocks are definitely lost in loss record 329 of 363
at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x51D0D4: xmalloc (util.c:120)
by 0x572E17: json_create (json.c:1442)
by 0x572E17: json_array_create (json.c:217)
by 0x572E17: json_array_create_2 (json.c:238)
by 0x4F69EA: ovsdb_idl_db_init (ovsdb-idl.c:424)
by 0x4F6A58: ovsdb_idl_create (ovsdb-idl.c:454)
by 0x40FAC7: bridge_init (bridge.c:396)
by 0x406F93: main (ovs-vswitchd.c:106)
3,727 (40 direct, 3,687 indirect) bytes in 1 blocks are definitely lost in loss record 358 of 363
at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x51D0D4: xmalloc (util.c:120)
by 0x572EBA: json_create (json.c:1442)
by 0x572EBA: json_object_create (json.c:254)
by 0x573254: json_parser_push_object (json.c:1264)
by 0x573254: json_parse_value.isra.12 (json.c:1293)
by 0x57339F: json_parser_input (json.c:1398)
by 0x5742C1: json_lex_input (json.c:982)
by 0x5748EB: json_parser_feed (json.c:1140)
by 0x57597A: jsonrpc_recv.part.7 (jsonrpc.c:332)
by 0x5768A7: jsonrpc_recv (jsonrpc.c:1140)
by 0x5768A7: jsonrpc_session_recv (jsonrpc.c:1113)
by 0x4F4E5C: ovsdb_idl_run (ovsdb-idl.c:818)
by 0x4100F9: bridge_run (bridge.c:2949)
by 0x406FB4: main (ovs-vswitchd.c:121)
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Wed, 4 Apr 2018 11:26:02 +0000 (13:26 +0200)]
ofproto-dpif: Init ukey->dump_seq to zero
In the current implementation the dump_seq of a new datapath flow ukey
is set to seq_read(udpif->dump_seq). This implies that any revalidation
during the current dump_seq period (up to 500 ms) is skipped.
This can trigger incorrect behavior, for example when the the creation of
datapath flow triggers a PACKET_IN to the controller, which which course
the controller installs a new flow entry that should invalidate the
original datapath flow.
Initializing ukey->dump_seq to zero implies that the first dump of the
flow, be it for revalidation or dumping statistics, will always be
executed as zero is not a valid value of the ovs_seq.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
tests: Allow only 1 job at once during recheck (RECHECK=yes)
Currently some tests fail frequently if run with multiple jobs at once, so
this commit disables parallel jobs during the recheck phase.
Some tests fails often than other tests (when run in multiple jobs), so
rarely (on a big machine, with 32 cores and by using -j32) a test
fails twice and so make check RECHECK=yes fails.
This patch only avoid this rare failure that can be annoying if you are
releasing a new package (on Fedora or RHEL for example) since you need
to re-build all the architectures only for a false positive on a
single architecture.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN: add tcp_reset action to ovn acl reject support
Whenever the acl reject rule is hit by a TCP segment send back
a TCP RST packet to close the connection using the tcp_reset action.
Moreover add add tcp_reset test case to 'ACL reject rule test'
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Mark Michelson [Wed, 28 Mar 2018 21:35:05 +0000 (16:35 -0500)]
Add stopwatch timing API
This is similar to the existing coverage and perf-counter APIs in OVS.
However, rather than keeping counters, this is aimed at timing how long
operations take to perform. "Operations" in this case can be anything
from a loop iteration, to a function, to something more complex.
The library allows for named stopwatches to be created. From there, the
stopwatch can be started and stopped via stopwatch_start() and
stopwatch_stop(). After each run, statistics for the stopwatch will be
calculated.
Statistics for a particular stopwatch can be queried from the command
line by using ovs-appctl -t <target> stopwatch/show <stopwatch name>.
Statistics can be reset from the command line using
ovs-appctl -t <target> stopwatch/reset <stopwatch name>
Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Mon, 26 Mar 2018 07:36:27 +0000 (09:36 +0200)]
ofp-actions: Correct execution of encap/decap actions in action set
The actions encap, decap and dec_nsh_ttl were wrongly flagged as set_field
actions in ofpact_is_set_or_move_action(). This caused them to be executed
twice in the action set or a group bucket, once explicitly in
ofpacts_execute_action_set() and once again as part of the list of
set_field or move actions.
Fixes: f839892a ("OF support and translation of generic encap and decap") Fixes: 491e05c2 ("nsh: add dec_nsh_ttl action") Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sun, 1 Apr 2018 17:01:24 +0000 (10:01 -0700)]
tests: Fix sed usage in pmd test.
SUSv7 2016 Edition says:
[2addr] {editing command
editing command
...
}
Execute a list of sed editing commands only when the pattern space is
selected. The list of sed editing commands shall be surrounded by
braces. The braces can be preceded or followed by <blank> characters.
The <right-brace> shall be preceded by a <newline> or <semicolon>
(before any optional <blank> characters preceding the <right-brace>).
This usage in pmd.at omitted the semicolon before the right brace. This
commit fixes the problem, which was rejected by the sed utility on Alpine
Linux (which presumably comes from some version of busybox, but BusyBox
v1.22.1 (Debian 1:1.22.0-9+b1) on my system accepts the form without
semicolon).
Reported-by: Stuart Cardall <developer@it-offshore.co.uk>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046460.html Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Stuart Cardall <developer@it-offshore.co.uk>
Ben Pfaff [Sun, 1 Apr 2018 00:12:55 +0000 (17:12 -0700)]
ovs-vswitchd: Do not use system routing table with --disable-system.
The --disable-system option indicates that the user wants to avoid using
the host's datapath. This is also a good indication that the user does
not want to use other host resources such as the routing table, so this
commit implements that.
This fixes a failure in the test "ptap - recirculate after packet_type
change" when the host routing table contained an entry that affected the
generated flow. Without this patch, the commands:
Ben Pfaff [Sun, 1 Apr 2018 00:12:54 +0000 (17:12 -0700)]
vswitchd: Allow user to directly specify sFlow agent address.
At least for testing purposes, and perhaps in production, it is useful to
be able to fix the agent IP address directly, rather that indirecting it
through a device name or the routing table.
This commit uses this feature to fix the agent IP address used in the unit
tests. This will be particularly useful in an upcoming commit that
disables the use of the system routing table in the unit tests, to make
the tests' results independent of the host's routes.
CC: Neil McKee <neil.mckee@inmon.com> Tested-By: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Mon, 2 Apr 2018 17:14:49 +0000 (10:14 -0700)]
tests: Skip tests that need "normal" diff if not available.
busybox diff does not support "normal" diff format, only the unified
format. A few OVS tests rely on "normal" format, so those would fail.
This commit avoids the problem by skipping tests that require "normal"
format if it is not available.
Reported-by: Stuart Cardall <developer@it-offshore.co.uk>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046460.html Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Skip network namespace id check on windows since we lack support
and integration for their equivalent at the moment.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
---
v2: Integrate comments as suggested by Ben and add him as co author
Windows: Fix broken build caused by a bad file extension
The compiler (cl) complains:
`ovsdb/ovsdb-server.c(689) : fatal error C1083:
Cannot open include file: 'ovsdb/_server.ovsschema.inc':
No such file or directory`
(https://ci.appveyor.com/project/blp/ovs/build/1.0.4079#L2586)
Generated compiler objects have the extension `.obj` on Windows.
This patch switches to `$(OBJEXT)` instead, so the schema will be generated.
Signed-off-by: Alin Gabriel Serdean aserdean@ovn.org Acked-by: Ben Pfaff <blp@ovn.org>
Grow a new opt-in feature to check comments for possible spelling
mistakes. Uses the 'enchant' library to provide a default link to
aspell/ispell as the backend.
Additionally, a custom set of kewords is included inline to match what
would be possibly encountered in 'the wild'. The list is fairly
comprehensive at this point.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
For the infix operator whitespace checks, some of these operators are
used within comments. In those cases, it probably doesn't make sense
to warn about whitespacing.
There may be other checks that could use this kind of filter, but
that can wait for a future commit (and someone ambitious enough to
test each case).
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Flavio Leitner [Fri, 30 Mar 2018 02:05:29 +0000 (23:05 -0300)]
netlink linux: enable listening to all nsids
Internal ports may be moved to another network namespace
and when that happens, the vswitch stops receiving netlink
notifications.
This patch enables the vswitch to listen to all network
namespaces that have a nsid assigned into the network
namespace where the socket has been opened.
It requires kernel 4.2 or newer.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Flavio Leitner [Fri, 30 Mar 2018 02:05:27 +0000 (23:05 -0300)]
netnsid: update device only if netnsid matches.
Recent kernels provide the network namespace ID of a port,
so use that to discover where the port currently is.
A network device in another network namespace could have the
same name, so once the socket starts listening to other network
namespaces, it is necessary to confirm the netnsid.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
aginwala [Fri, 23 Mar 2018 20:44:52 +0000 (13:44 -0700)]
Use new default nb and sb dbs for sandbox northd:
As per new clustering change, ovn-northd sandbox should use nb1.ovsdb and
sb1.ovsdb. It was updated in ovn-northd --help section but missed for sandbox.
This commit fixes the same
Reported-by: Mark Michelson <mmichels@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/345535.html Signed-off-by: aginwala <aginwala@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 23 Mar 2018 22:46:58 +0000 (15:46 -0700)]
json: Avoid extra memory allocation and string copy parsing object members.
Until now, every time the JSON parser added an object member, it made an
extra copy of the member name and then freed the original copy. This is
wasteful, so this commit eliminates the extra copy.
Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Ben Pfaff [Fri, 23 Mar 2018 22:46:49 +0000 (15:46 -0700)]
seq: Avoid some "possible leak" warnings from valgrind.
valgrind regards a block to be "possibly" leaked when no pointers exist to
the beginning of the block but some pointers do point to the middle of the
block. By moving the hmap_node in struct seq_waiter from the middle of the
struct to the beginning, as this commit does, the pointers to the node from
the hmap in struct seq point to the beginning of the block, which reassures
valgrind.
Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Ben Pfaff [Tue, 20 Mar 2018 05:01:47 +0000 (22:01 -0700)]
ofp-flow: Reduce memory consumption for ofputil_flow_mod, using minimatch.
Until now, struct ofputil_flow_mod, which represents an OpenFlow flow table
modification request, has incorporated a struct match, which made the
overall ofputil_flow_mod about 2.5 kB. This is OK for a small number of
flows, but absurdly inflates memory requirements when there are hundreds of
thousands of flows. This commit fixes the problem by changing struct match
to struct minimatch inside ofputil_flow_mod, which reduces its size to
about 100 bytes plus the actual size of the flow match (usually a few dozen
bytes).
This affects memory usage of ovs-ofctl (when it adds a large number of
flows) more than ovs-vswitchd.
Reported-by: Michael Ben-Ami <mbenami@digitalocean.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Armando Migliaccio <armamig@gmail.com> Tested-by: Armando Migliaccio <armamig@gmail.com> Reviewed-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Ben Pfaff [Tue, 20 Mar 2018 04:34:26 +0000 (21:34 -0700)]
flow: Improve type-safety of MINIFLOW_GET_TYPE.
Until mow, this macro has blindly read the passed-in type's size, but
that's unnecessarily risky. This commit changes it to verify that the
passed-in type is the same size as the field and, on GCC and Clang, that
the types are compatible. It also adds a version that does not check,
for the one case where (currently) we deliberately read the wrong size,
and updates a few uses to use more precise field names.
Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Armando Migliaccio <armamig@gmail.com>
Ben Pfaff [Mon, 19 Mar 2018 20:11:26 +0000 (13:11 -0700)]
match: Add 'tun_md' member to struct minimatch.
struct match has had a 'tun_md' member for a long time, but struct
minimatch has never had one. This doesn't matter for the purposes for
which minimatch is currently used, but it means that a minimatch is not
completely substitutable for a match and therefore blocks some new uses.
This patch adds the member.
Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Armando Migliaccio <armamig@gmail.com>
Ian Stokes [Wed, 21 Mar 2018 20:11:22 +0000 (20:11 +0000)]
lib/tc: Fix sparse warnings.
"sparse" complains with the warning 'incorrect type in argument 1
(different base types)' in function nl_parse_flower_ip when parsing a key
flag and in function nl_msg_put_flower_options when writing the key
flag. Fix this by using network byte order when reading and writing key
flags to netlink messages.
Fixes: 83e86606 ("netdev-tc-offloads: Add support for IP fragmentation") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
Fragment mask (any and later) always exists so we need to test
for FLOW_NW_FRAG_LATER only if the state is FLOW_NW_FRAG_ANY.
Before this fix we could pass frag no and first at the same time to TC
which is also not tested there for bad frag state.
This fix make sure we only pass frag first/later if is frag.
Fixes: 83e866067ea6 ("netdev-tc-offloads: Add support for IP fragmentation") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Aaron Conole [Tue, 13 Feb 2018 21:42:16 +0000 (16:42 -0500)]
rhel: don't drop capabilities when running as root
Currently, regardless of which user is being set as the running user,
Open vSwitch daemons on RHEL systems drop capabilities. This means the
very powerful CAP_SYS_ADMIN is dropped, even when the user is 'root'.
For the majority of use cases this behavior works, as the user can
enable or disable various configurations, regardless of which datapath
functions are desired. However, when using certain DPDK PMDs, the
enablement and configuration calls require CAP_SYS_ADMIN.
Instead of retaining CAP_SYS_ADMIN in all cases, which would practically
nullify the uid/gid and privilege drop, we don't pass the --ovs-user
option to the daemons. This shunts the capability and privilege
dropping code.
Reported-by: Marcos Felipe Schwarz <marcos.f.sch@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/045955.html Fixes: e3e738a3d058 ("redhat: allow dpdk to also run as non-root user") Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-By: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Russell Bryant <russell@ovn.org>
Ben Pfaff [Mon, 1 Jan 2018 05:15:58 +0000 (21:15 -0800)]
ovsdb: Introduce experimental support for clustered databases.
This commit adds support for OVSDB clustering via Raft. Please read
ovsdb(7) for information on how to set up a clustered database. It is
simple and boils down to running "ovsdb-tool create-cluster" on one server
and "ovsdb-tool join-cluster" on each of the others and then starting
ovsdb-server in the usual way on all of them.
One you have a clustered database, you configure ovn-controller and
ovn-northd to use it by pointing them to all of the servers, e.g. where
previously you might have said "tcp:1.2.3.4" was the database server,
now you say that it is "tcp:1.2.3.4,tcp:5.6.7.8,tcp:9.10.11.12".
This also adds support for database clustering to ovs-sandbox.
Ben Pfaff [Wed, 6 Dec 2017 19:37:03 +0000 (11:37 -0800)]
ovsdb-server: Add new RPC "set_db_change_aware".
The _Server database recently added to ovsdb-server can be used to dump out
information about databases, but monitoring updates to _Server is not yet
very useful because for historical reasons ovsdb-server drops all of its
OVSDB connections whenever databases are added or removed or otherwise
change in some major way. It is not a good idea to change this behavior
for all clients, because some of them rely on it, but this commit
introduces a new RPC that allows clients that understand _Server to
suppress the connection-closing behavior.
Ben Pfaff [Fri, 15 Dec 2017 19:14:55 +0000 (11:14 -0800)]
ovsdb-server: Add support for a built-in _Server database.
The _Server database is valuable primarily because it provides database
clients a way to find out the details of changes to databases, schemas,
etc. in a granular, natural way. Until now, the only way that the server
could notify clients about these kinds of changes was to close the session;
when the client reconnects, it is expected to reassess the server's state.
One way to provide this kind of granular information would be to add
specific JSON-RPC requests to obtain notifications for different kinds of
changes, but since ovsdb-server already provides granular and flexible
notification support for databases, using a database for the purpose is
convenient and avoids duplicating functionality.
Initially this database only reports databases' names and schemas, but
when clustering support is added in a later commit it will also report
important aspects of clustering and cluster status. Thus, this database
also reduces the need to add JSON-RPC calls to retrieve information about
new features.
Ben Pfaff [Fri, 1 Sep 2017 22:03:34 +0000 (15:03 -0700)]
ovn-sbctl: Allow retries by default.
Most of the OVS database-manipulation utilities (ovn-sbctl, ovn-nbctl,
ovs-vsctl, vtep-ctl) don't retry their connections by default because
they assume that the database is either up or down and likely to stay
that way. The OVN southbound database, however, is a likely candidate
for high availability clustering, so that even if it appears to be
down for a moment it will be available again soon. So, prepare for
the clustering implementation by enabling retry by default in
ovn-sbctl.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Fri, 15 Dec 2017 18:59:36 +0000 (10:59 -0800)]
ovsdb-idl: Break out database-specific stuff into new data structure.
Until now, a given ovsdb-idl instances has only monitored a single
database. In an upcoming commit, it will grow to also monitor a second
database that represents the state of the database server itself. Much of
the work is the same for both databases, so this commit breaks the common
code and data out into new data structures and functions.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Mon, 22 Jan 2018 19:22:57 +0000 (11:22 -0800)]
jsonrpc-server: Separate changing read_only status from reconnecting.
The code in jsonrpc-server conflated two different kinds of functionality.
It makes sense for the client to be able to change whether a particular
server is read-only. It also makes sense for the client to tell a server
to reconnect. The code in jsonrpc-server only provided a single function
that does both, which is weird. This commit breaks these apart.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Mon, 22 Jan 2018 19:20:47 +0000 (11:20 -0800)]
ovsdb: Drop distinction between monitors and replicas.
Until now, OVSDB distinguished "monitors", which are associated with OVSDB
JSON-RPC client sessions and allow clients to find out about database
changes, from "replicas", which are associated with databases and also find
out about database changes and act on them in some way. Now that
committing to disk has been broken into a separate concept, there is a
one-to-one and "onto" relationship between monitors and replicas: every
monitor M has a replica R and R is associated with M as well. It's easier
if we just treat them as a single entity, and that's what this commit
implements.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Tue, 12 Sep 2017 23:28:28 +0000 (16:28 -0700)]
ovsdb-server: Distinguish logs from other replicas.
Until now, ovsdb-server has internally chained a list of replicas from each
database. Whenever ovsdb_txn_commit() commits a transaction, it passes the
transaction to each replica. The first replica, which is always the disk
file that stores the database, is special because it is the only replica
that can report an error and thereby abort the transaction. This is a very
special property that genuinely distinguishes this first replica from the
others on the chain. This commit breaks that first replica out as a
separate kind of entity that is not on the list of replicas. When later
commits add support for clustering, there will only be more and more
special cases for the "first replica", so it makes sense to distinguish it
this way.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Mon, 22 Jan 2018 19:09:40 +0000 (11:09 -0800)]
jsonrpc: Allow jsonrpc_session to have more than one remote.
The implementation cycles through the remotes in random order. This allows
clients to perform some load balancing across alternative implementations
of a service.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Mon, 22 Jan 2018 19:04:58 +0000 (11:04 -0800)]
reconnect: Add ability to do a number of retries without backoff.
This is aimed at an upcoming database clustering implementation, where it's
desirable to try all of the cluster members quickly before backing off to
retry them again in sequence.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Russell Bryant <russell@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Fri, 8 Dec 2017 00:01:01 +0000 (16:01 -0800)]
log: Add async commit support.
The OVSDB log code has always had the ability to commit the log to disk and
wait for the commit to finish. This patch introduces a new feature that
allows the client to start a commit in the background and then to determine
asynchronously that the commit has completed. This will be especially
useful later for the distributed database feature.
Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Ilya Maximets [Fri, 23 Mar 2018 09:56:53 +0000 (12:56 +0300)]
netdev-dpdk: Limit rate of DPDK logs.
DPDK could produce huge amount of logs. For example, in case of
exhausting of a mempool in vhost-user port, following message will be
printed on each call to 'rte_vhost_dequeue_burst()':
|ERR|VHOST_DATA: Failed to allocate memory for mbuf.
These messages are increasing ovs-vswitchd.log size extremely fast
making it unreadable and non-parsable by a common linux utils like
grep, less etc. Moreover continuously growing log could exhaust the
HDD space in a few hours breaking normal operation of the whole system.
To avoid such issues, DPDK log rate limited to 600 messages per minute.
This value is high, because we still want to see many big logs like
vhost-user configuration sequence. The debug messages are treated
separately to avoid looss of errors/warnings in case of intensive debug
enabled in DPDK.
Kevin Traynor [Thu, 22 Mar 2018 17:20:58 +0000 (17:20 +0000)]
netdev-dpdk: Remove 'error' from non error log.
Presently, if OVS tries to setup more queues than
are allowed by a specific NIC, OVS will handle
this case by retrying with a lower amount of queues.
Rather than reporting initial failed queue setups
in the logs as ERROR, they are reported as INFO but
contain the word 'error'. Unless a user has detailed
knowledge of OVS-DPDK workings, this is confusing.
Let's remove 'error' from the INFO log.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ian Stokes [Thu, 8 Mar 2018 15:46:58 +0000 (15:46 +0000)]
dpdk: Use DPDK 17.11.1 release.
Modify docs and travis linux build script to use the DPDK 17.11.1
release branch to benefit from most recent bug fixes.
There are no new features introduced in the DPDK release, only back
ported bug fixes. For completeness these bug fixes have been documented
under the 17.11.1 section in the link below.
Kevin Traynor [Tue, 6 Mar 2018 12:07:09 +0000 (12:07 +0000)]
Documentation: Add note about dpdkvhostuser and IOMMU.
The docs describe IOMMU support for dpdkvhostuserclient ports,
but it is not mentioned in the section about dpdkvhostuser
ports. Add an explicit note to say IOMMU is not supported for
dpdkvhostuser ports.
CC: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Roi Dayan [Mon, 12 Mar 2018 12:58:47 +0000 (14:58 +0200)]
netdev-tc-offloads: Add support for IP fragmentation
Add support for frag no, first and later.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
ovsdb-client: Set binary mode when doing backup/restore
Add some needed consistency on Windows for STD_IN/OUT file descriptors
when doing backup and restore.
Reported-at:https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343518.html Suggested-by: Ben Pfaff <blp@ovn.org> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Wed, 14 Mar 2018 21:57:23 +0000 (14:57 -0700)]
odp-util: Print eth() for Ethernet flows if packet_type is absent.
OVS datapaths have two different ways to indicate what kind of packet a
flow matches. One way, used by the userspace datapath, is
OVS_KEY_ATTR_PACKET_TYPE. Another way, used by the kernel datapath, is
OVS_KEY_ATTR_ETHERTYPE when used in the absence of OVS_KEY_ATTR_ETHERNET;
when the latter is present, the packet is always an Ethernet packet. The
code to print datapath flows wasn't paying attention to this distinction
and always omitted eth() from the output when OVS_KEY_ATTR_ETHERNET was
fully wildcarded, which meant that upon later re-parsing the
OVS_KEY_ATTR_ETHERNET key was omitted, which made it look like a
non-Ethernet match was being described.
This commit makes odp_util_format() add eth() to the output when
OVS_KEY_ATTR_ETHERNET is present and OVS_KEY_ATTR_PACKET_TYPE is absent,
avoiding the problem.
Reported-by: Amar Padmanabhan <amarpadmanabhan@fb.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045817.html Reported-by: Su Wang <suwang@vmware.com>
VMWare-BZ: #2070488 Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Miguel Angel Ajo [Mon, 12 Mar 2018 10:31:25 +0000 (10:31 +0000)]
ovs-vsctl: Include bfd_status in "show" output for interfaces
Since OVS 2.8 OVN provides L3HA capabilities via BFD monitoring,
but checking the status of BFD is not obvious, and we provide
a simple way to visualize the status with this simple patch.
Signed-off-by: Miguel Angel Ajo <majopela@redhat.com> Tested-by: Miguel Angel Ajo <majopela@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Daniel Alvarez [Sat, 10 Mar 2018 13:50:14 +0000 (14:50 +0100)]
ovsdb: Fix database compaction check
We want to compact database file if it has been over 24 hours since we
last compacted it and there's more than 100 commits regardless of the
size of the database. This patch fixes the previous comparisson which
checked if 24 hours was elapsed since the next scheduled compaction.
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>