ofproto: update mtu when port is getting removed as well
When we're adding the port into ovs bridge, its mtu is updated to the minimal
mtu of the included port. But when the port is getting removed, no such update
is performed, which leads to bug. For example, when the port with minimal mtu
is removed, bridge's mtu must adapt to new value, but it won't happen.
How to reproduce the problem:
$ ovs-vsctl add-br testing
$ ip link add name gretap11 type gretap local 10.0.0.1 remote 10.0.0.100
$ ip link add name gretap12 type gretap local 10.0.0.1 remote 10.0.0.200
$ ip link set dev gretap12 mtu 1600
$ ovs-vsctl add-port testing gretap11
$ ovs-vsctl add-port testing gretap12
$ ip a sh testing
16: testing: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN
group default qlen 1
link/ether 7a:42:95:00:96:40 brd ff:ff:ff:ff:ff:ff
$ ovs-vsctl del-port gretap11
$ ip a sh testing
16: testing: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN
group default qlen 1
link/ether 7a:42:95:00:96:40 brd ff:ff:ff:ff:ff:ff
$## as we can see here, 'testing' bridge mtu is stuck, while it must
adapt to new '1600' value,
$## cause there is only one port 'gretap12' left, and it's mtu is '1600':
$ ip a sh gretap12
19: gretap12@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop master
ovs-system state DOWN group default qlen 1000
link/ether b2:c6:1d:9f:be:0d brd ff:ff:ff:ff:ff:ff
My commit fixes this problem - mtu update is performed on port removal as well.
Signed-off-by: wisd0me <ak47izatool@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Nithin Raju [Thu, 19 May 2016 22:31:49 +0000 (15:31 -0700)]
datapath-windows: o/p buffer must fit NL error message
OVS_IOCTL_WRITE and OVS_IOCTL_TRANSACT can generate a
netlink error that is represented by a OVS_MESSAGE_ERROR
struct. We want to make sure at the entry point of the
ioctl processing that the output buffer is big enough
to hold the error message. We were earlier checking
for struct OVS_MESSAGE which is smaller.
Since we are ensuring that output buffer can fit
OVS_MESSAGE_ERROR at the top of the ioctl function,
there's no need to check for that later.
Nithin Raju [Thu, 19 May 2016 22:31:48 +0000 (15:31 -0700)]
datapath-windows: don't map output buffer in OVS_IOCTL_WRITE
The contract of OVS_IOCTL_WRITE is that write operations
will not need the output buffer. Only the input buffer
will be used in the IRP. So, better to not map the output
buffer at all.
Nithin Raju [Tue, 17 May 2016 17:15:21 +0000 (10:15 -0700)]
datapath-windows: Use l2 port and tunkey during execute
While testing DFW and recirc code it was found that userspace
was calling into packet execute with the tunnel key and the
vport added as part of the execute structure. We were not passing
this along to the code that executes actions. The right thing is
to contruct the key based on all of the attributes sent down from
userspace.
Ian Stokes [Tue, 24 May 2016 16:36:50 +0000 (17:36 +0100)]
netdev-dpdk.c: Add generic policer functions.
Add generic policer functions to avoid code duplication.
Policing can be implemented on both egress and ingress paths.
Currently the QoS egress-policer implementation uses it's own specific run
and packet handle policer functions. This patch makes the policer functions
generic so that they can be used regardless of whether the policer is egress
or ingress by just requiring a pointer to the rte_meter used for policing
to be passed.
Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Jarno Rajahalme [Tue, 24 May 2016 07:25:36 +0000 (00:25 -0700)]
ofp-actions: Allow conntrack action in group buckets.
Conntrack action used in group buckets lets
us do simple load-balancing.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
[guru@ovn.org updated the commit message and made
a small change to the test output format] Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Instead of hardcoding 'kill `cat pid`' on every call to AT_CHECK is
simpler to use on_exit.
This makes sure that we kill every started daemon and fixes a travis
build timeout.
Suggested-by: Ben Pfaff <blp@ovn.org>
Tested-at: https://travis-ci.org/ddiproietto/ovs/builds/132404750 Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
Pravin B Shelar [Tue, 24 May 2016 03:27:14 +0000 (20:27 -0700)]
netdev-native-tnl: Introduce ip_build_header()
The native tunneling build tunnel header code is spread across
two different modules, it makes pretty hard to follow the code.
Following patch refactors the code to move all code to
netdev-ative-tnl module.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Tue, 17 May 2016 03:08:01 +0000 (20:08 -0700)]
upcall: Unregister dpif cbs in udpif_destroy().
During udpif_create(), we register callbacks for handling upcalls and
purging the datapath; however, in the corresponding udpif_destroy() we
never did this. This could potentially lead to dereference of
uninitialized memory in the userspace datapath if the main thread
destroys the udpif then executes an OpenFlow packet-out.
Fixes: e4e74c3a2b9a ("dpif-netdev: Purge all ukeys when reconfigure pmd.") Fixes: 623540e4617e ("dpif-netdev: Streamline miss handling.") Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.
When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().
The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().
This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.
This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()). The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required(). After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.
This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
dpif-netdev: Handle errors in reconfigure_pmd_threads().
Errors returned by netdev_set_multiq() and netdev_rxq_open() weren't
handled properly in reconfigure_pmd_threads(). In case of error now we
remove the port from the datapath.
Also, part of the code is moved in a new function, port_reconfigure().
ofproto-dpif: Call dpif_poll_threads_set() before dpif_run().
An upcoming commit will make dpif_poll_threads_set() record the
requested configuration and dpif_run() apply it, so it makes sense to
change the order.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
ovs-thread: Do not quiesce in ovs_mutex_cond_wait().
ovs_mutex_cond_wait() is used in many functions in dpif-netdev to
synchronize with pmd threads, but we can't guarantee that the callers do
not hold RCU references, so it's better to avoid quiescing.
In system_stats_thread_func() the code relied on ovs_mutex_cond_wait()
to introduce a quiescent state, so explicit calls to
ovsrcu_quiesce_start() and ovsrcu_quiesce_end() are added there.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org>
netdev objects are hard to use with RCU, because it's not possible to
split removal and reclamation. Postponing the removal means that the
port is not removed and cannot be readded immediately. Waiting for
reclamation means introducing a quiescent state, and that may introduce
subtle bugs, due to the RCU model we use in userspace.
This commit changes the port container from cmap to hmap. 'port_mutex'
must be held by readers and writers. This shouldn't have performance
impact, as readers in the fast path use a thread local cache.
dpif-netdev: Add pmd thread local port cache for transmission.
A future commit will stop using RCU for 'dp->ports' and use a mutex for
reading/writing them. To avoid taking a mutex in dp_execute_cb(), which
is called in the fast path, this commit introduces a pmd thread local
cache of ports.
The downside is that every port add/remove now needs to synchronize with
every pmd thread.
Among the advantages, keeping a per thread port mapping could allow
greater control over the txq assigment.
dpif-netdev: Fix race condition in pmd thread initialization.
The pmds and the main threads are synchronized using a condition
variable. The main thread writes a new configuration, then it waits on
the condition variable. A pmd thread reads the new configuration, then
it calls signal() on the condition variable. To make sure that the pmds
and the main thread have a consistent view, each signal() should be
backed by a wait().
Currently the first signal() doesn't have a corresponding wait(). If
the pmd thread takes a long time to start and the signal() is received
by a later wait, the threads will have an inconsistent view.
The commit fixes the problem by removing the first signal() from the
pmd thread.
This is hardly a problem on current master, because the main thread
will call the first wait() a long time after the creation of a pmd
thread. It becomes a problem with the next commits.
dpif-netdev: Initialize packet RSS hash in dpif_netdev_execute().
The datapath code expects the RSS hash to always be initialized. This
is enforced by checking in emc_processing() that the hash is valid, and
eventually by computing a new one.
Unfortunately, there is another entry point to the datapath,
dpif_netdev_execute(). A packet generated by OVS (BFD frame,
packet-out from controller) doesn't have a valid RSS hash and so is
allowed to enter the datapath with an uninitialized hash value.
This commit recomputes the hash (if not valid) in dpif_netdev_execute().
The only place where we would use an invalid hash is netdev-vport, in
push_udp_header(). This caused an uninitialized memory read, and a
random value to be assigned to the outer tunnel header source port.
Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
All the callers of the function already have a copy of the extracted
flow in their stack (or a few frames before).
This is useful for different resons:
* It forces the callers to also call flow_extract() on the packet, which
is necessary to initialize the l2,l3,l4 pointers.
* It will be used in the userspace datapath to generate the RSS hash by
a following commit
* It can be used by the userspace connection tracker to avoid extracting
the l3 type again.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
flow: Fix uninitialized reads in [mini]flow_hash_5tuple().
Almost every caller expects [mini]flow_hash_5tuple() to be able to deal
with all kinds of flows, not only TCP and UDP.
Currently, when dealing with non L4 flows, the function may access
uninitialized memory. This commit changes it to return prematurely with
a partial hash value instead of reading uninitialized memory.
Found by valgrind.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
Aaron Conole [Fri, 20 May 2016 15:52:59 +0000 (11:52 -0400)]
utilities/checkpatch.py: Check for appropriate bracing
Teach checkpatch.py to understand that if/for/while blocks should always
end with braces on the same line (if possible). This does not address
multi-line if/for/while blocks, but provides a point where such blocks
could be added.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 20 May 2016 14:49:02 +0000 (07:49 -0700)]
stp: Initialize mutex whenever we register unixctl command.
The stp/tcn command, which locks the mutex, was being registered without
initializing the mutex, so calling stp/tcn before STP was enabled on the
switch caused a crash. This commit fixes the bug by initializing the mutex
at the same time we register the stp/tcn command.
Aaron Conole [Fri, 20 May 2016 14:50:46 +0000 (10:50 -0400)]
utilities/ovs-ctl.in: Only add_managers with vswitchd
The ovs-ctl script was changed recently to have per-service start/stop
control. However, when that change was made the add_managers() call was
overlooked. This results in calls to `ovs-ctl --no-ovs-vswitchd start`
telling the ovsdb-server to connect to the remote controllers.
The fix presented will defer signaling to remote managers until the
following are both true:
1. At least one of OVSDB_SERVER or OVS_VSWITCHD was told to start
2. Both daemons are running.
Fixes: 7fc28c50c012 ("ovs-ctl: Allow selective start for db and switch") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
YAMAMOTO Takashi [Fri, 13 May 2016 14:36:15 +0000 (14:36 +0000)]
utilities: Tweak python shebangs to use env
"python" command provided by pkg_alternatives is a shell script.
At least on NetBSD-7, execve can't execute scripts whose interpreter
is another shell script. (While some "rich" shells like zsh seem
to have handle the case by itself, NetBSD's /bin/sh doesn't.)
Workaround the issue by using env command for shebangs for
these scripts.
Noticed with the recent tunnel-push-pop.at tests using ovs-pcap command.
Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
YAMAMOTO Takashi [Fri, 13 May 2016 14:11:20 +0000 (14:11 +0000)]
ovn-controller-vtep.at: Pre-sort output before feeding to "sort -d"
NetBSD's "sort -d" preserves the order of lines which doesn't have
alphanumeric and blanks. eg. empty lines and [].
It means it sometimes preserve unstable order of the list output.
Also, simply remove -d option where the expected output doesn't
include [].
Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
YAMAMOTO Takashi [Fri, 13 May 2016 12:57:48 +0000 (12:57 +0000)]
ovsdb-server.at: Fix races
As ovsdb-server creates pid file before unixctl socket, waiting
for pid file creation is not enough. Fix the race by retrying
with "version" command before assuming the server is up.
Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
YAMAMOTO Takashi [Fri, 13 May 2016 11:42:55 +0000 (11:42 +0000)]
dpif: Remove a warning
Remove "attempted to unregister a datapath provider that is not registered"
warning. It's normal for --enabled-dummy=system with userland-only build.
ovn-controller-vtep.at tests use the flag and fail on the extra warning.
Alternatively, we can make the tests ignore this specific warning.
But currently it doesn't make much sense as dp_unregister_provider
is only used for --enabled-dummy.
Signed-off-by: YAMAMOTO Takashi <yamamoto@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
The logical switch lsw0 created in this test has no logical
port corresponding to hv3, so that hypervisor does not have
any bridges created by OVN. With this test change, we are
replacing the 'show br-int' with a check to ensure that
'br-int' is not present.
Fixes: 8dab102238f0 ("ovn: Add more details to test output.") Signed-off-by: Flavio Fernandes <flavio@flaviof.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
William Tu [Sun, 15 May 2016 15:52:33 +0000 (08:52 -0700)]
ovn-nbctl: Fix memory leak reported by Valgrind.
Definitely lost is reported by test 2026: ovn -- 3 HVs, 1 LS, 3 lports/HV.
ds_put_char__ (dynamic-string.c:82)
ds_put_char (dynamic-string.h:88)
process_escape_args (process.c:103)
main (ovn-nbctl.c:92)
Another leak shown at ovn-sbctl.c with similar pattern.
Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Pravin B Shelar [Wed, 18 May 2016 00:32:33 +0000 (17:32 -0700)]
dpif-netdev: create batch object
DPDK datapath operate on batch of packets. To pass the batch of
packets around we use packets array and count. Next patch needs
to associate meta-data with each batch of packets. So Introducing
a batch structure to make handling the metadata easier.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Pravin B Shelar [Wed, 18 May 2016 00:32:06 +0000 (17:32 -0700)]
netdev: Return number of packet from netdev_pop_header()
Current tunnel-pop API does not allow the netdev implementation
retain a packet but STT can keep a packet from batch of packets
during TCP reassembly processing. To return exact count of
valid packet STT need to pass this number of packet parameter
as a reference.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
ovs-vtep: Support running multiple ovs-vtep processes
Include ovs-vtep physical switch name as part of logical switch name to
support running multiple ovs-vtep processes sharing the same ovsdb and vswitchd.
Code inserted that autogenerates corresponding map functions to set and
delete elements in map columns.
Inserts description to the functions that are autogenerated.
Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com> Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com> Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Edward Aymerich [Mon, 2 May 2016 19:59:44 +0000 (13:59 -0600)]
ovsdb-idl: Add partial map updates functionality.
In the current implementation, every time an element of either a map or set
column has to be modified, the entire content of the column is sent to the
server to be updated. This is not a major problem if the information contained
in the column for the corresponding row is small, but there are cases where
these columns can have a significant amount of elements per row, or these
values are updated frequently, therefore the cost of the modifications becomes
high in terms of time and bandwidth.
In this solution, the ovsdb-idl code is modified to use the RFC 7047 'mutate'
operation, to allow sending partial modifications on map columns to the server.
The functionality is exposed to clients in the vswitch idl. This was
implemented through map operations.
A map operation is defined as an insertion, update or deletion of a key-value
pair inside a map. The idea is to minimize the amount of map operations
that are send to the OVSDB server when a transaction is committed.
In order to keep track of the requested map operations, structs map_op and
map_op_list were defined with accompanying functions to manipulate them. These
functions make sure that only one operation is send to the server for each
key-value that wants to be modified, so multiple operation on a key value are
collapsed into a single operation.
As an example, if a client using the IDL updates several times the value for
the same key, the functions will ensure that only the last value is send to
the server, instead of multiple updates. Or, if the client inserts a key-value,
and later on deletes the key before committing the transaction, then both
actions cancel out and no map operation is send for that key.
To keep track of the desired map operations on each transaction, a list of map
operations (struct map_op_list) is created for every column on the row on which
a map operation is performed. When a new map operation is requested on the same
column, the corresponding map_op_list is checked to verify if a previous
operations was performed on the same key, on the same transaction. If there is
no previous operation, then the new operation is just added into the list. But
if there was a previous operation on the same key, then the previous operation
is collapsed with the new operation into a single operation that preserves the
final result if both operations were to be performed sequentially. This design
keep a small memory footprint during transactions.
When a transaction is committed, the map operations lists are checked and
all map operations that belong to the same map are grouped together into a
single JSON RPC "mutate" operation, in which each map_op is transformed into
the necessary "insert" or "delete" mutators. Then the "mutate" operation is
added to the operations that will be send to the server.
Once the transaction is finished, all map operation lists are cleared and
deleted, so the next transaction starts with a clean board for map operations.
Using different structures and logic to handle map operations, instead of
trying to force the current structures (like 'old' and 'new' datums in the row)
to handle then, ensures that map operations won't mess up with the current
logic to generate JSON messages for other operations, avoids duplicating the
whole map for just a few changes, and is faster for insert and delete
operations, because there is no need to maintain the invariants in the 'new'
datum.
Signed-off-by: Edward Aymerich <edward.aymerich@hpe.com> Signed-off-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com> Co-authored-by: Arnoldo Lutz <arnoldo.lutz.guevara@hpe.com>
[blp@ovn.org made style changes and factored out error checking] Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 17 May 2016 23:29:39 +0000 (16:29 -0700)]
ofproto-dpif-xlate: Fix compilation with GCC 4.6.
Without this change, GCC 4.6 reports:
ofproto/ofproto-dpif-xlate.c: In function ‘xlate_actions’:
ofproto/ofproto-dpif-xlate.c:5117:27: error: missing initializer
ofproto/ofproto-dpif-xlate.c:5117:27: error: (near initialization for
‘(anonymous).masks.vlan_tci’)
Reported-by: Joe Stringer <joe@ovn.org>
Reported-at: https://travis-ci.org/openvswitch/ovs/builds/130256491 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
William Tu [Sat, 30 Apr 2016 05:13:46 +0000 (22:13 -0700)]
tests: Add support for helgrind thread error detector.
Helgrind is a Valgrind tool for detecting thread errors, reporting three
classes of errors: misuses of the POSIX pthreads API, potential deadlocks
arising from lock ordering problems, and data races -- accessing memory
without adequate locking. Similar to valgrind, users do "make check-helgrind"
and results will be saved at tests/testsuite.dir/<N>/helgrind.*.
Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Joe Stringer [Fri, 13 May 2016 21:17:12 +0000 (14:17 -0700)]
ofproto-dpif-upcall: Fix UFID usage with flow_modify.
As per the delete_op_init{,__}() functions, the UFID should only be
passed down if ukey->ufid_present is set. Otherwise it is possible to
request a flow modification only using a UFID in a datapath that doesn't
support UFID, which will fail.
Fixes: 43b2f131a229 ("ofproto: Allow in-place modifications of datapath flows.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Justin Pettit [Thu, 12 May 2016 00:28:54 +0000 (17:28 -0700)]
dpctl: Sort port listing in "show" command.
The port listing did not consistently print in the same order. While it
is a better user experience to see the ports printed in order, more
importantly, this fixes a unit test ("dpctl - add-if set-if del-if")
that would occasionally fail due to expecting that the ports are printed
in order.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Use the new ccmap type instead of cmap for staged lookup indices to
fix the problem with slow removal of rules with large number of
duplicates. This was problematic especially when many rules shared
the same match in packet metadata (e.g., a port number, but nothing
else), causing a large number of duplicates to be inserted into the
staged lookup index. ccmap only keeps the count of inserted (hash)
values, so duplicates do not add any performance penalty.
cmap implements duplicates as linked lists, which causes removal of
rules to become (O^2) with large number of duplicates. This patch
fixes the problem by introducing a new 'counting' variant of the cmap
(ccmap), which can be efficiently used to keep counts of inserted hash
values provided by the caller. This does not require a node in the
user data structure, so this makes the user implementation a bit more
memory efficient, too.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
ovn: Fix localnet ports deletion and recreation sometimes after restart.
On graceful restart of ovn-controller, the chassis row is inserted in the
Chassis table. During this transaction, there is a window of time where an
idl row-read may not return the newly created row - even though the row
should exist, but the transaction is in an incomplete state. As a result,
get_chassis() in binding_run() returns a null chassis record binding_run
exits early, and does not create local_datapaths, and patch_run deletes
localnet patch ports. In a later run, the localnet patch ports are
recreated.
This is reproducable consistently but not on every restart. The fix is to
handle the case that the chassis record may be null in binding_run, and yet
create local_datapaths.
Restart logs follow with commentary:
2016-04-28T18:35:42.448Z|00001|vlog|INFO|opened log file /home/ovs/ovs/tests/testsuite.dir/2035/hv/ovn-controller.log
2016-04-28T18:35:42.449Z|00002|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connecting...
2016-04-28T18:35:42.449Z|00003|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connected
2016-04-28T18:35:42.452Z|00004|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connecting...
2016-04-28T18:35:42.452Z|00005|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connected
2016-04-28T18:35:42.454Z|00006|ovsdb_idl|INFO|ovsdb_idl_txn_insert:
Chassis row inserted into transaction above
2016-04-28T18:35:42.454Z|00007|binding|INFO|Claiming lport localvif2 for this chassis.
2016-04-28T18:35:42.454Z|00008|binding|INFO|Claiming lport localvif3 for this chassis.
2016-04-28T18:35:42.454Z|00009|binding|INFO|Claiming lport localcif4 for this chassis.
2016-04-28T18:35:42.454Z|00010|binding|INFO|Claiming lport localcif5 for this chassis.
2016-04-28T18:35:42.454Z|00011|binding|INFO|Claiming lport localcif1 for this chassis.
2016-04-28T18:35:42.454Z|00012|binding|INFO|Claiming lport localvif1 for this chassis.
2016-04-28T18:35:42.454Z|00013|binding|INFO|Claiming lport localvif201 for this chassis.
2016-04-28T18:35:42.454Z|00014|binding|INFO|Claiming lport localcif3 for this chassis.
2016-04-28T18:35:42.454Z|00015|binding|INFO|Claiming lport localcif2 for this chassis.
Binding run found the chassis record and has claimed the vifs
2016-04-28T18:35:42.455Z|00016|ofctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch
2016-04-28T18:35:42.455Z|00017|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting...
2016-04-28T18:35:42.455Z|00018|pinctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch
2016-04-28T18:35:42.456Z|00019|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting...
2016-04-28T18:35:42.457Z|00020|ovsdb_idl|INFO|ovsdb_idl_row_clear_new:
At this point read of Chassis table returns no rows, and
the transaction status is still incomplete.
2016-04-28T18:35:42.457Z|00021|binding|INFO|no chassis rec!
Binding run exits early because chassis_rec was null
2016-04-28T18:35:42.459Z|00022|patch|INFO|removing port patch-br-int-to-localnet201
2016-04-28T18:35:42.459Z|00023|patch|INFO|removing port patch-br-int-to-localnet1
2016-04-28T18:35:42.459Z|00024|patch|INFO|removing port patch-localnet1-to-br-int
2016-04-28T18:35:42.459Z|00025|patch|INFO|removing port patch-localnet201-to-br-int
Localnet ports are removed above, because local_datapaths dont exist
2016-04-28T18:35:42.459Z|00026|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected
2016-04-28T18:35:42.460Z|00027|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected
2016-04-28T18:35:42.460Z|00028|ovsdb_idl|INFO|ovsdb_idl_row_create:
Now, the transaction is complete
Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
In some use cases such as VM migration or when VMs reuse IP addresses, VMs
become unreachable externally because external switches/routers on localnet
have stale port-mac or ARP caches. The problem resolves after some time
when the caches ageout which could be minutes for port-mac bindings or
hours for ARP caches.
To fix this, send some gratuitous ARPs when a logical port on a localnet
datapath gets added. Such gratuitous ARPs help on a best-effort basis to
update the mac-port bindings and ARP caches of external switches and
routers on the localnet.
Move the function extract_lport_addresses to a file
in ovn/lib since that function can be used by ovn-controller also
to parse addresses stored in the mac column of the
port_binding table. Currently that function is used only
in ovn_northd.
Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
daemon-unix: Properly handle missing users or groups.
From the manpages of getgrnam_r (getpwnam_r is similar):
"If no matching group record was found, these functions return 0 and
store NULL in *result."
The code checked only against errors, but non existing users didn't set
e != 0 therefore the code could try to set arbitrary uid/gid values.
Fixes: e91b927d lib/daemon: support --user option for all OVS daemon Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Markos Chandras [Tue, 10 May 2016 08:21:00 +0000 (09:21 +0100)]
acinclude.m4: Fix skb_get_hash function detection
Commit e2f3178f0582 ("datapath: Add support for kernel 3.14.") added
support for 3.14 kernels and a new OVS_GREP_IFELSE check for the
"skg_get_hash" function in the process. "skb_get_hash" was introduced
in the Linux kernel commit 3958afa1b272 ("net: Change skb_get_rxhash to
skb_get_hash") which exists in >=3.14 but the OVS_GREP_IFELSE macro
also matches the "skb_get_hash_raw" function which exists in older
kernels. As a result of which, the check makes the build system
behave as if the "skb_get_hash" function is available in these older
kernels leading to build failures. We fix this by explicitly checking
for "skb_get_hash(" which matches the function definition.
William Tu [Fri, 13 May 2016 17:33:07 +0000 (10:33 -0700)]
ovsdb-server: Fix memory leak reported by Valgind.
Reported by test 1657: ovsdb-server/add-db and remove-db.
ds_put_format (dynamic-string.c:142)
query_db_remotes (ovsdb-server.c:798)
reconfigure_remotes (ovsdb-server.c:988)
main_loop (ovsdb-server.c:156)
Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
William Tu [Fri, 13 May 2016 18:58:43 +0000 (11:58 -0700)]
ovn-controller: Fix errors reported by Valgrind.
Fix two errors reported by test 2026: ovn -- 3 HVs, 1 LS, 3 lports/HV.
1. Conditional jump or move depends on uninitialised value(s)
physical_run (physical.c:366)
main (ovn-controller.c:382)
2. Use of uninitialised value of size 8
bitmap_set1 (bitmap.h:97)
update_ct_zones (binding.c:115)
binding_run (binding.c:228)
main (ovn-controller.c:362)
Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sat, 23 Apr 2016 00:45:03 +0000 (17:45 -0700)]
ofproto-dpif-xlate: Always generate wildcards.
Until now, the flow translation code has tried to avoid constructing a
set of wildcards during translation in the cases where it can, because
wildcards are large and somewhat expensive. However, this has problems
that we hadn't previously realized. Specifically, the generated actions
can depend on the constructed wildcards, to decide which bits of a field
need to be set in a masked set_field action. This means that in practice
translation needs to always construct the wildcards.
(It might be possible to avoid masked set_field when we're not constructing
wildcards, but this would mean that we'd generate different actions
depending on whether wildcards were being constructed, which seems rather
confusing at best. Also, the cases in which we don't need wildcards anyway
are fairly obscure, meaning that the benefits of avoiding them in those
cases are minimal and that it's going to be hard to get test coverage. The
latter is probably why we didn't notice this until now.)
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-April/069219.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Tested-by: William Tu <u9012063@gmail.com>
Joe Stringer [Tue, 10 May 2016 22:50:42 +0000 (15:50 -0700)]
netdev-dpdk: Fix locking during get_stats.
Clang complains:
lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every path
through here [-Werror,-Wthread-safety-analysis]
}
^
lib/netdev-dpdk.c:1815:5: note: mutex acquired here
ovs_mutex_lock(&dev->mutex);
^
./include/openvswitch/thread.h:60:9: note: expanded from macro 'ovs_mutex_lock'
ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR)
^
Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on RFC 2819.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Joe Stringer [Tue, 10 May 2016 22:42:01 +0000 (15:42 -0700)]
ofproto-dpif-upcall: Pass key to dpif_flow_get().
Windows datapath folks have reported instances where OVS userspace will
pass down a flow_get request to the datapath using a UFID even though the
datapath has no support for UFIDs. Since commit e672ff9b4d22
("ofproto-dpif: Restore metadata and registers on recirculation."), if a
flow dump provides a flow that userspace isn't aware of, and the flow
dump doesn't provide actions for that flow, then userspace will attempt
a flow_get using just the UFID. This is because the ofproto-dpif layer
doesn't pass the key down to the dpif layer even if it's available.
Prior to the above commit, the codepath was only hit if the key was not
available, which would have implied UFID support. This assumption is now
broken: An empty set of actions could also trigger flow_get, and
datapaths without UFID support are free to pass up empty actions lists.
Pass down the flow key if available, and don't pass down the UFID if
unavailable to be more consistent with the usage of other dpif APIs
within this file.
Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on recirculation.") Reported-by: Sairam Venugopal <vsairam@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Darrell Ball [Sat, 7 May 2016 16:21:21 +0000 (09:21 -0700)]
vtep: Add source node replication support.
This patch updates the vtep schema, vtep-ctl commands and vtep simulator
to support source node replication in addition to service node
replication per logical switch. The default replication mode is service
node as that was the only mode previously supported. Source node
replication mode is optionally configurable and clearing the replication
mode implicitly sets the replication mode back to a default of service
node.
László Sürü [Wed, 11 May 2016 08:46:33 +0000 (08:46 +0000)]
ofproto-dpif-xlate: fix for group liveness propagation
According to OpenFlow v1.3.5 specification a group is considered live,
if it has at least one live bucket in it. (6.5 Group Table
Modification Messages: "A group is considered live if a least one of
its buckets is live.")
However, OVS implementation incorrectly returns group as live when no
live bucket is found in group_is_alive() function of
ofproto-dpif-xlate.c.
Instead it should return true only if a live bucket is found (that is
!= NULL).
Signed-off-by: László Sűrű <laszlo.suru@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
Justin Pettit [Wed, 4 May 2016 01:20:51 +0000 (18:20 -0700)]
util: Pass 128-bit arguments directly instead of using pointers.
Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.)
introduced the ovs_u128_and() function. It directly takes ovs_u128
values as arguments instead of pointers to them. As this is a bit more
direct way to deal with 128-bit values, modify the other utility
functions to do the same.
Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Joe Stringer [Thu, 5 May 2016 01:01:06 +0000 (18:01 -0700)]
system-traffic: Wait for availability of ftpd.
Some FTP tests had intermittent failures because the FTP daemons
might not load before the testsuite script iterated to running the
client. Add checks after launching FTP daemons to make these tests more
resilient.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>