Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and
INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate the
novice user in setting up the OVS DPDK and running it out of box, the
ADVANCED document is targeted at expert users looking for the optimum
performance running dpdk datapath.
Paul Boca [Wed, 6 Jul 2016 12:38:32 +0000 (12:38 +0000)]
vlog test: Disable default syslog logger
Disable the syslog logger in case on Windows, '/dev/log' doesn't exist.
Seems like on Python34 a default handler is added to the logger and it prints
even if no handler is set by us.
Signed-off-by: Paul-Daniel Boca <pboca@cloudbasesolutions.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
ofproto_port_open_type should be used for netdev_open, but not for other tests.
For example, STP/RSTP check for interfaces of internal type, but that check will
fail when the netdev datapath is used.
The same thing goes for setting MAC address of internal Interfaces. That fails
for the netdev datapath because the interface type is set to "tap", but they are
still interfaces of type "internal", just their netdev implementation is
different.
Use a netdev_type for the type that needs to be used for netdev_open and
ofproto_port, while we still keep the type as the normalized configured type in
the database.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Jesse Gross <jesse@kernel.org>
Andy Zhou [Fri, 17 Jun 2016 22:41:26 +0000 (15:41 -0700)]
lib: Remove extra API dependency for ovs_thread_create()
When calling ovs_thread_create() without calling fatal_signal_init()
first, ovs_thread_create() some times asserts. This dependency is
subtle and not very obvious.
The root cause seems to be that, within ovs_thread_create(), the
multi-threaded state is declared before all initializations are done.
Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Tue, 5 Jul 2016 15:33:05 +0000 (08:33 -0700)]
netlink-notifier: Avoid valgrind possible leak warning.
This ensures that pointers to nln_notifiers are to the beginning of the
structs instead of to the middle, meaning that valgrind does not consider
them "possible" leaks.
Reported-by: William Tu <u9012063@gmail.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Sun, 3 Jul 2016 04:16:55 +0000 (21:16 -0700)]
bridge: Add assertion to document an invariant in find_local_hw_addr().
Avoids a possible null pointer dereference report from Clang.
Reported-at: http://openvswitch.org/pipermail/dev/2016-June/073967.html Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: William Tu <u9012063@gmail.com>
This commit adds schema changes to the OVN_Northbound database to support
Load balancers.
In ovn-northd, it adds two logical tables to program logical flows.
It adds a 'pre_lb' table that sits before 'pre_stateful' table.
For packets that need to be load balanced, this table sets reg0[0]
to act as a hint for the pre-stateful table to send the packet to
the conntrack table for defragmentation.
It also adds a 'lb' table that sits before 'stateful' table.
For packets from established connections, this table sets reg0[2] to
indicate to the 'stateful' table that the packet needs to be sent to
connection tracking table to just do NAT.
In stateful table, packet for a new connection that needs to be load balanced
is given a ct_lb($IP_LIST) action.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
ovn-controller now supports 2 new logical actions.
1. ct_lb;
Sends the packet through the conntrack zone to NAT
packets. Packets that are part of established connection
will automatically get NATed based on the NAT arguments
supplied to conntrack when the first packet was committed.
2. ct_lb(192.168.1.2, 192.168.1.3);
ct_lb(192.168.1.2:80, 192.168.1.3:80);
Creates an OpenFlow group with multiple buckets and equal weights
that changes the destination IP address (and port number) of the packet
statefully to one of the options provided inside the parenthesis.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ryan Moats [Sun, 3 Jul 2016 15:35:28 +0000 (10:35 -0500)]
Change tracking structures to use struct uuids
In encaps.c, binding.c, and lport.c incremental processing
is aided by tracking entries by their ovsdb row uuids.
The original patch sets used pointers, which might lead
to errors if the ovsdb row uuid memory is released. So,
use actual structures to hold the values instead.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently, the only use of stateful services in conntrack is
OVN ACLs. In table ACL, we commit the packet to conntrack
via ct_commit action.
As we introduce more stateful services, the ACL feature will
have to share the conntrack module with others. As
preparation for more stateful features like load balancing,
this commit introduces a new stateful table
that is responsible to commit packets to conntrack via
ct_commit action. If ACL table needs to commit a packet,
it sets 'reg0[1]' as 1. Stateful table in-turn will commit
the packet if 'reg0[1]' is 1.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Currently, the only use of stateful services in conntrack is
OVN ACLs. In table pre-ACL, we send the packet to conntrack
to track it (to get its status) and to defrag via the ct_next
action.
As we introduce more stateful services, the ACL feature will
have to share the conntrack module with others. As
preparation for more stateful features like loadbalancing,
this commit introduces a new pre-stateful table that is
responsible to send packets through conntrack via
ct_next action. If pre-ACL table needs to send a packet
through conntrack, it just sets the 'reg0[0]' as 1.
Pre-stateful table in-turn will send the packet to conntrack
if 'reg0[0]' is 1.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
This example match only has 3 addresses, but it could easily have
hundreds of addresses. In some cases, the same large set of addresses
needs to be used in several ACLs.
This patch adds a new Address_Set table to OVN_Northbound so that a set
of addresses can be specified once and then referred to by name in ACLs.
To recreate the above example, you would first create an address set:
ovn-controller: process lport bindings only when transaction is possible
As currently implemented, binding_run() normally updates the set of
locally owned logical ports on each call. When changes to the
membership of this set are detected (i.e. when locally bound
logical ports are added or deleted), additional processing to
update the sb database with lport binding is performed.
However, the sb database can only be updated when a transaction to
the sb database is possible (that is, when ctx->ovnsb_idl_txn is
non-NULL). If a new logical port is detected while ctx->ovnsb_idl_txn
happens to be NULL, its binding information will not be updated in
the the sb database until another change to the set of locally-owned
logical ports changes. If no such change ever occurs, the sb database
is never updated with the appropriate binding information.
Eliminate this issue by only updating the set of locally owned logical
ports when an sb database transaction is possible. This addresses
a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
test case.
The failing scenario goes like this:
1) Test case logical network setup is complete.
2) The last physical network port is added via
as hv3 ovs-vsctl --add-port ... --set Interface vif333 external-ids:iface-id=lp333
3) hv3 ovn-controller receives update from hv3 ovsdb-server with above mapping,
binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
4) binding_run() calls get_local_iface_ids(), which recognizes the new
local port as matching a logical port, so the lp333 is added to the
global ssets "lports" and "all_lports". This means lp333 will not be treated
as a new logical port on subsequent calls. Because getLocal_iface_ids()
has discovered a new lport, it returns changed = true.
5) Because get_local_iface_ids() returned true, binding_run() sets process_full_binding
to true.
6) Because process_full_binding is true, binding_run() calls consider_local_datapath()
for each logical port in shash_lports (which now includes lp333).
7) consider_local_datapath() processing returns without calling
sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
8) There are subsequent calls to binding_run() with non-NULL ctx->ovnsb_idl,
but because lp333 is already in the "lports" sset, get_local_iface_ids()
returns changed=false, so process_full_binding is false, which means
consider_local_datapath() is not called for lp333.
9) Because consider_local_datapath() is not called for lp333, the sb database
is not updated with the lport/chassis binding.
Hopefully the above is intelligible. Another way of looking at it would be
to say the condition for calling consider_local_datapath() is an "edge trigger",
this change suppresses the trigger until the necessary actions can be performed.
Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
If there are multiple logical switches or routers with a duplicate name,
the configuration is slightly different. You should configure the logical
switches or routers using the UUID instead of the name.
Signed-off-by: nickcooper-zhangtonghao <nickcooper-zhangtonghao@opencloud.tech> Signed-off-by: Ben Pfaff <blp@ovn.org>
When new tables are introduced, it gets a little harder to
track all the different table numbers used in the documentation.
This commit changes some table numbers to names to make it a little
easier to update documentation when new tables are introduced in the
upcoming commits.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
Future patches introduce more tables between
pre-ACL and ACL processing. As such, it looks
easier to separate these out into separate
functions to enhance code readability.
Signed-off-by: Gurucharan Shetty <guru@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
William Tu [Wed, 29 Jun 2016 21:38:02 +0000 (14:38 -0700)]
ofproto-dpif-mirror: Add mirror snaplen support.
This patch adds a 'snaplen' config for mirroring table. A mirrored packet
with size larger than snaplen bytes will be truncated in datapath before
sending to the mirror output port.
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/141186839 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
William Tu [Wed, 29 Jun 2016 17:35:00 +0000 (10:35 -0700)]
vagrant: Add FreeBSD 10.2 box support.
Add FreeBSD 10.2 vagrant file "Vagrantfile-FreeBSD". Users can run
'VAGRANT_VAGRANTFILE=Vagrantfile-FreeBSD vagrant up' to test basic
OVS configure, build, and check.
Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
William Tu [Wed, 29 Jun 2016 05:02:26 +0000 (22:02 -0700)]
ovn-nbctl: Fix double free in nbctl_lr_route_list().
The intent here was to free the error reported by ipv6_parse_cidr(),
but in fact the error reported by that function was discarded and
the previous error from ip_parse_cidr() was freed again.
Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Jan Scheurich [Tue, 28 Jun 2016 22:29:25 +0000 (00:29 +0200)]
ofproto: Add relaxed group_mod command ADD_OR_MOD
This patch adds support for a new Group Mod command OFPGC_ADD_OR_MOD to
OVS for all OpenFlow versions that support groups (OF11 and higher).
The new ADD_OR_MOD creates a group that does not yet exist (like ADD)
and modifies an existing group (like MODIFY).
Rational: In OpenFlow 1.x the Group Mod commands OFPGC_ADD and
OFPGC_MODIFY have strict semantics: ADD fails if the group exists,
while MODIFY fails if the group does not exist. This requires a
controller to exactly know the state of the switch when programming a
group in order not run the risk of getting an OFP Error message in
response. This is hard to achieve and maintain at all times in view of
possible switch and controller restarts or other connection losses
between switch and controller.
Due to the un-acknowledged nature of the Group Mod message programming
groups safely and efficiently at the same time is virtually impossible
as the controller has to either query the existence of the group prior
to each Group Mod message or to insert a Barrier Request/Reply after
every group to be sure that no Error can be received at a later stage
and require a complicated roll-back of any dependent actions taken
between the failed Group Mod and the Error.
In the ovs-ofctl command line the ADD_OR_MOD command is made available
through the new option --may-create in the mod-group command:
Zong Kai LI [Mon, 27 Jun 2016 06:54:52 +0000 (14:54 +0800)]
ovn: Add 'na' action and lflow for ND
This patch tries to support ND versus ARP for OVN.
It adds a new OVN action 'na' in ovn-controller side, and modify lflows
for 'na' action and relevant packets in ovn-northd.
First, for ovn-northd, it will generate lflows per each lport with its
IPv6 addresses and mac addresss, with 'na' action, such as:
match=(icmp6 && icmp6.type == 135 &&
(nd.target == fd81:ce49:a948:0:f816:3eff:fe46:8a42 ||
nd.target == fd81:ce49:b123:0:f816:3eff:fe46:8a42)),
action=(na { eth.src = fa:16:3e:46:8a:42; nd.tll = fa:16:3e:46:8a:42;
outport = inport;
inport = ""; /* Allow sending out inport. */ output; };)
and new lflows will be set in tabel ls_in_arp_nd_rsp, which is renamed
from previous ls_in_arp_rsp.
Later, for ovn-controller, when it received a ND packet, it frames a
template NA packet for reply. The NA packet will be initialized based on
ND packet, such as NA packet will use:
- ND packet eth.src as eth.dst,
- ND packet eth.dst as eth.src,
- ND packet ip6.src as ip6.dst,
- ND packet nd.target as ip6.src,
- ND packet eth.dst as nd.tll.
Finally, nested actions in 'na' action will update necessary fileds
for NA packet, such as:
- eth.src, nd.tll
- inport, outport
Since patch port for IPv6 router interface is not ready yet, this
patch will only try to deal with ND from VM. This patch will set
RSO flags to 011 for NA packets.
This patch also modified current ACL lflows for ND, not to do conntrack
on ND and NA packets in following tables:
- S_SWITCH_IN_PRE_ACL
- S_SWITCH_OUT_PRE_ACL
- S_SWITCH_IN_ACL
- S_SWITCH_OUT_ACL
Signed-off-by: Zong Kai LI <zealokii@gmail.com>
[blp@ovn.org made several minor simplifications and improvements] Signed-off-by: Ben Pfaff <blp@ovn.org>
Some options (such as -c X), when passed to tcpdump will cause it to
halt. When this occurs, ovs-tcpdump will not recognize that such
an event has happened, and will spew newlines across the screen
running forever. To fix this, ovs-tcpdump can poll and then raise a
KeyboardInterrupt event.
Now, when the underlying dump-cmd (such as tcpdump, tshark, etc.)
actually signals exit, ovs-tcpdump follows the SIGINT path, telling the
database to clean up. Exit is signalled by either returning, 'killing',
or closing the output descriptor.
Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The original implementation of ovs-tcpdump conflated interfaces and
ports needlessly. This commit changes ovs-tcpdump to only consider the
port name when looking up the corresponding bridge.
Windows: Add conntrack dump and flush support in userspace
Modify dpif-netlink.c and netlink-conntrack.c to send down dump and flush command
to Windows datapath. Include netlink-conntrack.c and netlink-conntrack.h
in automake.mk for Windows binaries.
Windows currently supports only NETLINK_GENERIC port. In order to support
the NETLINK_NETFILTER messages, the port id is being overwritten to
NETLINK_GENERIC on Windows and datapath has been updated to support the
new message format.
datapath-windows: Add support for dump-conntrack in datapath
Create the methods used for dumping conntrack entries from the hyper-v
datapath to userspace by means of netfilter netlink messages. Some of the
attributes are not supported by the datapath and have been defaulted to 0.
datapath-windows: Add support for Conntrack IPCTNL_MSG_CT_DELETE cmd in Datapath.c
Create new NETLINK_CMD and NETLINK_FAMILY to assist in flushing conntrack entries. Modify
Datapath.c to now support netfilter-netlink messages apart from the
existing netfilter-generic messages. Also hookup the command handler to
execute the OvsCtFlush in Conntrack.c
datapath-windows: Add support for flushing conntrack entries
Flush out all conntrack entries or those that match a given zone. Since
the conntrack module is internal to OVS in Windows, this functionality
needs to be added in.
datapath-windows: Add support for Netfilter netlink message
Introduce NF_GEN_MSG_HDR similar to GENL_MSG_HDR that will be used for
communicating via netfilter-netlink channel. This will be used by
userspace to retrieve and modify Conntrack data in Windows.
Windows: Add conntrack netfilter netlink definitions to kernel and userspace
Include netfilter-conntrack header definitions. This will be used by
Windows userspace for adding debugging support in Conntrack. Few of these
files are intentionally left blank to avoid removing #includes in
userspace. New file - OvsDpInterfaceCtExt.h has been defined similar to
OvsDpInterfaceExt.h to be reused by userspace and kernel.
Nirapada Ghosh [Thu, 16 Jun 2016 00:05:25 +0000 (17:05 -0700)]
ovn-controller: reload configured SB probe timer
The probe timer between ovn-controller and OVN Southbound
can be configured using ovn-vsctl command, but that is not
effective on the fly. In other words, ovn-controller has
to be restarted to use that probe_timer value, this patch
takes care of that.
Signed-off-by: Nirapada Ghosh <nghosh@us.ibm.com>
[blp@ovn.org made various adjustments] Signed-off-by: Ben Pfaff <blp@ovn.org>
David Marchand [Thu, 9 Jun 2016 09:52:49 +0000 (11:52 +0200)]
lib: Use a more accurate value for CPU count (sched_getaffinity).
Relying on /proc/cpuinfo to count the number of available cores is not
the best option:
- The code is x86-specific.
- If the process is started with a different CPU affinity, then it will
wrongly try to start too many threads (for an example, imagine an OVS
daemon restricted to 4 CPU threads on a 128 threads system).
This commit removes /proc/cpuinfo parsing. For Linux systems, it
introduces instead a call to sched_getaffinity(), which is
architecture-independant, in order to retrieve the list of CPU threads
available to the current process and to count them. Other UNIX-like
systems only use _SC_NPROCESSORS_ONLN.
Signed-off-by: David Marchand <david.marchand@6wind.com> Co-authored-by: Liu Xiaofeng <xiaofeng.liu@6wind.com> Signed-off-by: Liu Xiaofeng <xiaofeng.liu@6wind.com> Co-authored-by: Quentin Monnet <quentin.monnet@6wind.com> Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Sairam Venugopal [Sat, 25 Jun 2016 01:14:34 +0000 (18:14 -0700)]
datapath-windows: Cleanup conntrack-tcp.c
Update the code to use tcp->flags. This keeps the kernel conntrack-tcp.c file in sync with userspace version.
This patch also addresses an warning - 'Comparison of a boolean expression with an integer other than 0 or 1' - (tcp_flags & (TCP_ACK|TCP_RST)) == (TCP_ACK|TCP_RST))
Russell Bryant [Mon, 28 Mar 2016 23:05:40 +0000 (19:05 -0400)]
ovn: Add software l2 gateway.
This patch implements one approach to using ovn-controller to implement
a software l2 gateway between logical and physical networks.
A new logical port type called "l2gateway" is introduced here. It is very
close to how localnet ports work, with the following exception:
- A localnet port makes OVN use the physical network as the
transport between hypervisors instead of tunnels. An l2gateway port still
uses tunnels between all hypervisors, and packets only go to/from the
specified physical network as needed via the chassis the l2gateway port
is bound to.
- An l2gateway port also gets bound to a chassis while a localnet port does
not. This binding is not done by ovn-controller. It is left as an
administrative function. In the case of OpenStack, the Neutron plugin
will do this.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
openvswitch: Only set mark and labels with a commit flag.
Only set conntrack mark or labels when the commit flag is specified.
This makes sure we can not set them before the connection has been
persisted, as in that case the mark and labels would be lost in an
event of an userspace upcall.
OVS userspace already requires the commit flag to accept setting
ct_mark and/or ct_labels. Validate for this in the kernel API.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
openvswitch: Set mark and labels before confirming.
Set conntrack mark and labels right before committing so that
the initial conntrack NEW event has the mark and labels.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
Russell Bryant [Fri, 24 Jun 2016 19:44:17 +0000 (15:44 -0400)]
ovn: Test that port state goes up and down.
Some previous commits broke ovn-controller binding handling such that
ovn-controller never cleared out the chassis column of the Port_Binding
table. This broke OpenStack CI for OVN. This patch adds an OVN test
case that would have caught this issue.
Signed-off-by: Russell Bryant <russell@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
Update the "ct_commit;" logical flow action to optionally take
one or two parameters, setting the value of "ct_mark" or "ct_label".
Supported ct_commit syntax now includes:
Jesse Gross [Wed, 29 Jun 2016 01:14:53 +0000 (18:14 -0700)]
bfd: Allow setting OAM bit when encapsulated in tunnel.
Some tunnel protocols, such as Geneve, have a bit in the tunnel
header to indicate that it is an OAM packet. This means that the
packet should be processed as a tunnel control frame and not be
passed onto connected links.
When BFD is used inside of a tunnel it is often used in this control
capacity, so this adds an option to enable marking the outer header
when the output port is a tunnel that supports the OAM concept. It is
also possible to use tunnels as point-to-point links that are simply
carrying BFD as payload, so this is not always turned on.
Conceptually, this may also apply to other types of packets locally
generated by the switch, most obviously CFM. However, BFD seems to
be most commonly used for this type of tunnel monitoring application
so this only adds the option to BFD for the time being to avoid
unnecessarily adding configuration knobs that might never get used.
Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Joe Stringer [Tue, 28 Jun 2016 09:17:56 +0000 (11:17 +0200)]
datapath: Drop debug code in handle_fragments().
This piece of debug code was introduced during the backport of
conntrack, but is unnecessary and not upstream. Drop it to bring the
code more inline with upstream.
Paul Boca [Sun, 26 Jun 2016 12:12:23 +0000 (12:12 +0000)]
tests: Fixed PMD tests on Windows
CHECK_CPU_DISCOVERED check the log file now, not the stderr.
On Windows the ovs-vswitchd output is logged only in log file, not to stderr.
Tested both on Windows and Linux
Ansis Atteka [Mon, 20 Jun 2016 21:19:40 +0000 (14:19 -0700)]
bridge: allow OVS to interact with controller through sockets outside run dir
Currently Open vSwitch is unable to create or connect to Unix Domain
Sockets outside designated 'run' directory, because of fear of potential
remote exploits where a hacked remote OVSDB manager would tell Open vSwitch
to connect to a unix domain socket owned by other daemon on the same
hypervisor.
This patch allows to disable this behavior by changing
/etc/default/openvswitch (Ubuntu) or /etc/sysconfig/openvswitch (RHEL)
file to:
...
OVS_CTL_OPTS=--no-self-confinement
...
Note, that it is better to stick with default behavior, unless:
1. You have Open vSwitch running under SELinux or AppArmor
that would prevent OVS from messing with sockets owned by other
daemons; OR
2. You are sure that relying on OpenFlow handshake is enough to
prevent OVS to adversely interact with those other daemons
running on the same hypervisor; OR
3. You don't have much worries of remote exploits in the first
place, because perhaps OVSDB manager is running on the same host
as OVS.
The initial use-case for this patch is to allow to connect to OpenFlow
controller that has its socket outside OVS run directory. However,
in the future it could be generalized to allow to disable self-confinement
for other things like DPDK vhost-user sockets or anything else
that is specifiable in OVSDB with full path.
Paul Boca [Fri, 24 Jun 2016 16:51:49 +0000 (16:51 +0000)]
tests: Fixed ovsdb-monitor tests.
Redirect ovsdb-client stderr to /dev/null.
This fixes the series of tests that use OVSDB_CHECK_MONITOR macro.
The theory behind the fix was explained by Ben Pfaff as follows:
"I suspect I understand what's happening here.
To execute the following command, Autotest internally redirects stdout
and stderr to files named "stdout" and "stderr":
> ./ovsdb-monitor.at:47: ovsdb-client -vjsonrpc \
--pidfile="`pwd`"/client-pid -d json monitor --format=csv \
unix:socket ordinals ordinals > output &
> stderr:
> stdout:
Ordinarily, after the command exits it would close the file, but & means
that it holds the file open. While the next few ovsdb-client commands
run, it queues up some output in stdio buffers but doesn't bother to
actually flush it[*].
[*] There's either a hole in my theory here or Windows is not fully
ANSI C conformant since ANSI C says that "As initially opened,
the standard error stream is not fully buffered; ..." which
means that it'd probably be line buffered, so that each line of
the log is flushed separately.
On Unix-like OSes, the following Autotest commands don't really care
about this open file, since the OS will happily delete and replace the
"stderr" file and allow the previous file with that name to remain open.
On Windows, the OS won't permit that, so I guess the shell is actually
just opening the existing file.
Later, "ovs-appctl --target=`pwd`/unixctl exit" causes ovsdb-server to
exit. It flushes its accumulated stderr buffer to the OS, and therefore
it shows up in the "stderr" output as part of ovs-appctl's output since
ovs-appctl and ovsdb-server both had their output sent to the same file.
Probably, adding 2>/dev/null to the ovsdb-server command would solve the
problem. To get better output for debugging failures, also add
--log-file and AT_CAPTURE_FILE([ovsdb-server.log])."
Joe Stringer [Thu, 23 Jun 2016 01:00:44 +0000 (18:00 -0700)]
system-traffic: Remove basic connectivity tests.
For many of the tests, we would first execute a "basic connectivity
check" to validate the sanity of the setup before running the test
traffic which probes the actual OVS behaviour. However, by running
traffic through the rules prior to running the test, it is more likely
that the traffic hits datapath flows and doesn't test the "execute" path
(from userspace to kernel). This can hide some classes of bugs.
The first few tests in system-traffic already check the basic sanity of
the environment, so these redundant pieces are unnecessary. Remove them.
Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Joe Stringer [Thu, 23 Jun 2016 01:00:43 +0000 (18:00 -0700)]
compat: Backport ip_do_fragment().
Prior to upstream Linux commit d6b915e29f4a ("ip_fragment: don't forward
defragmented DF packet"), the fragmentation behaviour was incorrect when
dealing with linear skbs, as it would not respect the "max_frag_size"
that ip_defrag() provides, but instead attempt to use the output
device's MTU.
If OVS reassembles an IP message and passes it up to userspace, it
also provides a PACKET_ATTR_MRU to indicate the maximum received unit
size for this message. When userspace executes actions to output this
packet, it passes the MRU back down and this is the desired refragment
size. When the packet data is placed back into the skb in the execute
path, a frags list is not created so fragmentation code will treat it
as one big linear skb. Due to the above bug it would use the device's
MTU to refragment instead of the provided MRU. In the case of regular
ports, this is not too dangerous as the MTU would be a reasonable value.
However, in the case of a tunnel port the typical MTU is a very large
value. As such, rather than refragmenting the message on output, it
would simply output the (too-large) frame to the tunnel.
Depending on the tunnel type and other factors, this large frame could
be dropped along the path, or it could end up at the remote tunnel
endpoint and end up being delivered towards a remote host stack or VM.
If OVS is also controlling that endpoint, it will likely drop the packet
when sending to the final destination, because the packet exceeds the
port MTU.
Different OpenFlow rule configurations could end up preventing IP
messages from being refragmented correctly for as many as the first four
attempts in each connection.
Fix this issue by backporting ip_do_fragment() so that it will respect
the MRU value that is provided in the execute path.
VMWare-BZ: #1651589 Fixes: 213e1f54b4b3 ("compat: Wrap IPv4 fragmentation.") Reported-by: Salman Malik <salmanm@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Upstream commit:
openvswitch: Pass net into ovs_fragment
In preparation for the ipv4 and ipv6 fragmentation code taking a net
parameter pass a struct net into ovs_fragment where the v4 and v6
fragmentation code is called.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Upstream: c559cd3ad32b ("openvswitch: Pass net into ovs_fragment") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
Ben Pfaff [Sun, 26 Jun 2016 21:54:04 +0000 (14:54 -0700)]
ofp-util: Zero out padding bytes in ofputil_ipfix_stats_to_reply().
Otherwise IPFIX statistics leak memory from ovs-vswitchd.
Reported-by: William Tu <u9012063@gmail.com>
Reported-at: http://openvswitch.org/pipermail/dev/2016-June/073769.html Acked-by: William Tu <u9012063@gmail.com> Tested-by: Daniel Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Wenyu Zhang [Sat, 25 Jun 2016 00:10:07 +0000 (17:10 -0700)]
ipfix: Export user specified virtual observation ID
In virtual network, users want more info about the virtual point to observe the traffic.
It should be a string to provide clear info, not a simple interger ID.
Introduce "other-config: virtual_obs_id" in IPFIX, which is a string configured by user.
Introduce an enterprise IPFIX entity "virtualObsID"(898) to export the value. The entity is a
variable-length string.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Mario Cabrera [Sat, 25 Jun 2016 00:13:06 +0000 (17:13 -0700)]
ovsdb: Introduce OVSDB replication feature
Replication is enabled by using the following option when starting the
database server:
--sync-from=server
Where 'server' can take any form described in the ovsdb-client(1)
manpage as an active connection. If this option is specified, the
replication process is immediately started.
Signed-off-by: Mario Cabrera <mario.cabrera@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Mario Cabrera [Tue, 29 Mar 2016 15:28:05 +0000 (09:28 -0600)]
docs: OVSDB replication design document
The database replication functionality is designed to provide "fail
over" characteristics. There are two participating databases, one of
which is the "active" database and the other is the "stand by" database.
Replication happens exclusively from the active to the stand by
database.
This document explains how the replication functionality is implemented.
Signed-off-by: Mario Cabrera <mario.cabrera@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.")
introduces packet truncation before sending to userspace upcall receiver.
This patch passes up the skb->len before truncation so that the upcall
receiver knows the original packet size. Potentially this will be used
by sFlow, where OVS translates sFlow config header=N to a sample action,
truncating packet to N byte in kernel datapath. Thus, only N bytes instead
of full-packet size is copied from kernel to userspace, saving the
kernel-to-userspace bandwidth.
Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140135299 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Ilya Maximets [Fri, 24 Jun 2016 13:28:32 +0000 (16:28 +0300)]
netdev-dpdk: Fix using uninitialized link_status.
'rte_eth_link_get_nowait()' works only with physical ports.
In case of vhost-user port, 'link' will stay uninitialized and there
will be random messages in log about link status.
Ex.:
|dpdk(dpdk_watchdog2)|DBG|Port -1 Link Up - speed 10000 Mbps - full-duplex
Fix that by calling 'check_link_status()' only for physical ports.
Lutz, Arnoldo [Mon, 13 Jun 2016 16:06:48 +0000 (16:06 +0000)]
ovsdb-idl: Fix issues detected in Partial Map Update feature
We found some issues affecting Partial Map Update feature included in
master branch. This patch fixes a memory leak due to lack of freeing datum
allocated in the process of requesting a change to a map. It also fix an
error produced when NDEBUG flag is not set that causes an assertion when
preparing the map to be changed.
Fix of a memory leak not freeing datums.
Change use of ovsdb_idl_read function when preparing changes to maps.
Signed-off-by: arnoldo.lutz.guevara@hpe.com <arnoldo.lutz.guevara@hpe.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
netdev-dummy: Allow configuring the numa_id for testing purposes.
This commit introduces an (undocumented) option for dummy Interfaces to
specify a dummy numa_id, to which the device belongs. It will be used
to test the pmd threads in dpif-netdev.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
Ryan Moats [Fri, 24 Jun 2016 20:39:28 +0000 (15:39 -0500)]
ovn-controller: Fix port binding update on OVS port delete events.
Patch "Convert binding_run to incremental processing." introduced
a bug where the port binding table is not correctly updated when
an OVS port is deleted. Fix this by
- persisting the lport shash used to record OVS ports
- change get_local_iface_ids to return a bool indicating if
the persisted lport shash has changed
- change port binding table processing from incremental to full
if the persisted lport shash has changed
Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numa library is needed for NUMA aware vHost User functionality.
Incase of missing numa package, the OVS DPDK configuration fails with
"error: Could not find DPDK libraries in <DPDK_LOC>/TARGET/lib" though
the DPDK library is installed.
This patch fixes this inappropriate error by checking for presence of
numa library and output an appropriate error message "error: unable to
find libnuma, install the dependency package" in case of missing package.
Ben Pfaff [Fri, 24 Jun 2016 20:35:23 +0000 (13:35 -0700)]
Revert "ipfix: Export user specified virtual observation ID".
This reverts commit 337bebe91c94d9d201e28811c469869d32e978ff, which caused a
crash in test 1048 "ofproto-dpif - Flow IPFIX sanity check" (now test 1051)
with the following backtrace:
#0 hmap_first_with_hash (hmap=<optimized out>, hmap=<optimized out>,
hash=<optimized out>) at ../lib/hmap.h:328
#1 smap_find__ (smap=0x94, key=key@entry=0x817f7ab "virtual_obs_id",
key_len=14, hash=2537071222) at ../lib/smap.c:366
#2 0x0812b9d7 in smap_get_node (smap=0x9738a276,
key=0x817f7ab "virtual_obs_id") at ../lib/smap.c:198
#3 0x0812ba30 in smap_get (smap=0x94, key=0x817f7ab "virtual_obs_id")
at ../lib/smap.c:189
#4 0x08055a60 in bridge_configure_ipfix (br=<optimized out>)
at ../vswitchd/bridge.c:1237
#5 bridge_reconfigure (ovs_cfg=0x94) at ../vswitchd/bridge.c:666
#6 0x080568d3 in bridge_run () at ../vswitchd/bridge.c:2972
#7 0x0804c9dd in main (argc=10, argv=0xffd8b934)
at ../vswitchd/ovs-vswitchd.c:112