git.proxmox.com Git - ovs.git/log

ovn-controller-vtep.at: Fix intermittent test failure.

When testing the recreation of 'chassis' table entry by 'ovn-controller-
vtep'.  The removal of 'chassis' table entry by the 'ovn-sbctl' could
cause 'Broken pipe' warning in ovsdb-server.log.  This is due to the
race between 'ovn-sbctl' exiting and 'ovn-controller-vtep' adding
the chassis back.  So, if the 'ovn-sbctl' exits right when the
ovsdb-server tries to send update of readd of the deleted 'chassis',
the sending will fail with 'Broken pipe' error.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-sbctl: Make 'chassis-del' delete all encaps.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-controller-vtep.at: Fix intermittent test failure.

The test waits until grep no vlan '200' from the VTEP 'vlan_binding'
column. However, string '200' could also appear in other 'vlan_binding'
entry's uuid value. Instead, we should grep for '200='.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

rhel: add installed but not packaged OVN tools

This patch adds the following to OVN %files:
   /usr/bin/ovn-controller-vtep
   /usr/bin/ovn-sbctl
   /usr/share/man/man8/ovn-controller-vtep.8.gz
   /usr/share/man/man8/ovn-sbctl.8.gz

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

flow: Ignore invalid ICMPv6 fields when parsing packets

There is a miss-match between the handling of invalid ICMPv6 fields in the
implementations of parse_icmpv6() in user-space and in the kernel datapath.

This patch addresses that by modifying the user-space implementation to
match that of the kernel datapath; processing is terminated without
rather than with an error and partial information is cleared.

With these changes the user-space implementation of parse_icmpv6()
never returns an error. Accordingly the return type and caller have been
updated.

The original motivation for this is to allow matching the ICMPv6 type and
code of packets with invalid neighbour discovery options although only the
change around the '(!opt_len || opt_len > *sizep)' conditional is necessary
to achieve that goal.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

ofproto: Allow in-place modifications of datapath flows.

There are certain use cases (such as bond rebalancing) where a
datapath flow's actions may change, while it's wildcard pattern
remains the same.  Before this patch, revalidators would note the
change, delete the flow, and wait for the handlers to install an
updated version.  This is inefficient, as many packets could get
punted to userspace before the new flow is finally installed.

To improve the situation, this patch implements in place modification
of datapath flows.  If the revalidators detect the only change to a
given ukey is its actions, instead of deleting it, it does a put with
the MODIFY flag set.

Signed-off-by: Ethan J. Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-upcall: Make ukey actions modifiable with RCU.

Future patches will need to modify ukey actions in some instances.
This patch makes this possible by protecting them with RCU. It also
adds thread safety checks to enforce the new protection mechanism.

Signed-off-by: Ethan J. Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

db-ctl-base: Allow print rows that weak reference to table in
'cmd_show_table'.

Sometimes, it is desirable to print the table with weak reference to
the table specified in 'struct cmd_show_table'. For example the
Port_Binding table rows in OVN_Southbound database that refer to the
same Chassis table row can be printed under the same chassis entry
in 'ovn-sbctl show' output.

To achieve it, this commit adds a new struct in 'struct cmd_show_table'
that allows users to print a table with weak reference to 'table'
specified in 'struct cmd_show_table'. The 'ovn-sbctl' which now prints
the Port_Binding entries with Chassis table, is the first user of this
new feature.

Requested-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ovn-sbctl: Print stage name in addition to table number.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>

ovn-northd: Store name of the logical flow stage in external-ids.

This will be useful in a future commit.

It also introduces #define's for logical stages instead of in-place
constants.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>

classifier: Do not use mf_value.

mf_value has grown bigger than needed for storing the biggest
supported prefix (IPv6 address length). Define a new type to be used
instead of mf_value.

This makes classifier lookups a bit faster.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

ovn: Add lflow-list to ovn-sbctl.

I frequently view the contents of the Logical_Flow table while working
on OVN. Add a command that can output the contents of this table in a
sorted way that makes it easier to read through. It's sorted by
logical datapath, pipeline, table id, priority, and match.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Acked-by: Alex Wang <alexw@nicira.com>

classifier: Simplify minimask_hash().

minimask_hash() can be simplified as each value is known to be non-zero.

Move miniflow_hash() into test-classifier.c as miniflow_hash__() as it
is no longer needed elsewhere.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

classifier: Remove unused hash functions.

Remove unused cls_rule_hash() and minimatch_hash() functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

flow: Avoid compile errors.

GCC (4.7) sees too wide shifts when there are none, refactor to
circumvent the false error.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

classifier: Fix comment.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

system-kmod-macros: Fix VSWITCHD_STOP.

This was renamed. Surprisingly, the tests still pass without this,
however the extra checks that this command performs were not executed.
Fix the macro definition.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>

ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ovn-controller-vtep: Add gateway module.

This commit adds the gateway module to ovn-controller-vtep. The
module will register the physical switches to ovnsb as chassis and
constantly update the "vtep_logical_switches" column in Chassis table.

Limitation (Recorded in TODO file):

- Do not support reading multiple tunnel ips of physical switch.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovsdb-idl: Move get_initial_snapshot() to ovsdb-idl.

The same function is defined in both ovn-controller.c and
ovn-controller-vtep.c, so worth librarizing.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn: Add controller for VTEP gateway.

This commit lays down the foundation for a new controller in OVN, the
ovn-controller-vtep, for controlling the vtep enabled gateways.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-architecture: Document the registers used for logical ports.

When reviewing the OpenFlow flows generated by ovn-controller, it's nice
to have this information.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

idl-loop: Move idl-loop into ovsdb-idl library.

idl-loop is needed in implementing other controller (i.e., vtep controller).
So, this commit moves the logic into ovsdb-idl library module.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-northd: Pass logical port type and options to ovn-sb database.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-sbctl: Add ovn-sbctl.

This commit adds ovn-sbctl to ovn family by using the db-ctl-base
library.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-nbctl: Move ovn-nbctl to utilities directory.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-sb: Remove the "Gateway" table from the ovn-sb schema.

In a gateway like the VTEP L2 gateway, physical vlans belonging to
the same logical network form a "logical switch".  Each logical switch
has a dedicated tunnel key and will keep records of all MACs learned
from the owned vlans.  So user can just send packet to a "logical
switch" and the gateway will figure out the output port and vlan tag
automatically.

Therefore, it is really not necessary to keep record of the vlan map
for each gateway physical port in the OVN_Southbound database using
"gateway_ports" and to map each vlan to a unique ovn logical port.
Instead, we should simply map each logical switch to a ovn logical
port.

Thusly, this commit removes the "Gateway" table from the OVN_Southbound
database.  In the "Chassis" table, the "gateway_ports" column is replaced
by "vtep_logical_switches" column which stores all vtep logical switch
names.  The use of this column will be documented in later commit.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovn-controller: Fix flows between two local ports.

A flow was missing from the remote output table that causes local
packets to be resubmitted to the local ouptut table.

Reported-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

Vagrantfile: Add test_ovs_system_userspace provision.

Add 'test_ovs_system_userspace' provision. Command:
# vagrant provision --provision-with=test_ovs_system_userspace

will run "make check-system-userspace" in the vagrant launched VM.

It may be more convenient to run this tests inside a vm rather than in
the host, because they interact with system networking.

Suggested-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

tests: Add system-userspace-testsuite.

The new system-userspace-testsuite, which can be launched via
`make check-system-userspace`, reuses the kmod tests on the userspace
datapath.

The userspace datapath is already tested by the main testsuite (and
that's not going to change), but having also the
system-userspace-testsuite has the following advantages:

* More complicated tests are possible: real client and server
  applications can be used.
* The same tests run on both kernel and userspace datapath: this gives
  us an easy way to make sure that the behaviour is consistent (e.g.
  with the upcoming connection tracker integration)

The userspace datapath is able to use system network interfaces via an
AF_PACKET socket.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

tests: Introduce NS_EXEC and NS_CHECK_EXEC for system tests.

Instead of repeating every time "ip netns exec ..." it is better to
introduce some macros.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

tests: Rename kmod-testsuite to system-kmod-testsuite.

The name makes more sense, especially with the addition of a userspace
system testsuite. No functional change in this commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

kmod-macros: Don't unload kmod in VSWITCHD_STOP.

We already queue the removal of the kernel module in OVS_VSWITCHD_START,
via an ON_EXIT() call. That command is executed in both the success and
failure cases, so it is unnecessary to unload the kernel module in
OVS_VSWITCHD_STOP.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>

kmod-macros: Move some code to traffic-common-macros.

These macros will also be used by userspace datapath testing in
following commits. No functional change in this commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

tests: Rename kmod-traffic.at to traffic.at.

The file will be part of two different testsuites: one for the kernel
datapath and another for the userspace datapath. No functional change
in this commit.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

INSTALL.DPDK.md: Add details of XL710 restrictions for DPDK

Currently there are restrictions regarding the use of the XL710 network
interface with OVS and DPDK. This patch details those restrictions in
INSTALL.DPDK.md.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.

It has been observed that some DPDK device (e.g intel xl710) report an
high number of queues but make some of them available only for special
functions (SRIOV). Therefore the queues will be counted in
rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail.

This commit works around the issue by retrying the device initialization
with a smaller number of queues, if a queue fails to setup.

Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>

dpif-netdev: Translate Geneve options per-flow, not per-packet.

The kernel implementation of Geneve options stores the TLV option
data in the flow exactly as received, without any further parsing.
This is then translated to known options for the purposes of matching
on flow setup (which will then install a datapath flow in the form
the kernel is expecting).

The userspace implementation behaves a little bit differently - it
looks up known options as each packet is received. The reason for this
is there is a much tighter coupling between datapath and flow translation
and the representation is generally expected to be the same. This works
but it incurs work on a per-packet basis that could be done per-flow
instead.

This introduces a small translation step for Geneve packets between
datapath and flow lookup for the userspace datapath in order to
allow the same kind of processing that the kernel does. A side effect
of this is that unknown options are now shown when flows dumped via
ovs-appctl dpif/dump-flows, similar to the kernel.

There is a second benefit to this as well: for some operations it is
preferable to keep the options exactly as they were received on the wire,
which this enables. One example is that for packets that are executed from
ofproto-dpif-upcall to the datapath, this avoids the translation of
Geneve metadata. Since this conversion is potentially lossy (for unknown
options), keeping everything in the same format removes the possibility
of dropping options if the packet comes back up to userspace and the
Geneve option translation table has changed. To help with these types of
operations, most functions can understand both formats of data and seamlessly
do the right thing.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

dpif-netdev: Don't use metaflow to operate on userspace datapath fields.

If ofproto-dpif installs a flow into the userspace datapath that doesn't
include a mask, we need to synthesize an exact match one. This is currently
done using the metaflow infrastructure, iterating over each field and
setting it to all ones.

There is a conceptual mismatch here because metaflow is operating on
OpenFlow fields, not datapath ones. Even though they are generally very
similar, there are subtle differences, which is why it is necessary to
fix up the input port mask.

With Geneve options, the mapping is much more complicated and so the
situation is worse. The first issue is that the metaflow to flow
mapping can change over time, so we would need to do more revalidation
to track this. In addition, an upcoming patch will completely disconnect
the option format between ofproto-dpif and dpif-netdev, so the values
written by metaflow don't make sense at all.

When megaflows are turned off, ofproto-dpif internally generates masks
using flow_wildcards_init_for_packet(). Since that's the same as what
we want to do here, we can just use that instead of metaflow.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

datapath: Revert "datapath: Constify netlink structs."

This reverts commit 2023bdcfc44c149a8e3b38dcde8f04f2ec3f8501.
This commit is causing segfaults when genl compat code is in use.

Compat code update genl_multicast_group and genl_family type objects.
Therefore these can not be const.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

dpif-netdev: fix race for queues between pmd threads.

Currently pmd threads select queues in pmd_load_queues() according to
get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
beacause dp_netdev_set_pmds_on_numa() starts them one by one and
current number of threads changes incrementally.

As a result we may have the following situation with 2 pmd threads:

* dp_netdev_set_pmds_on_numa()
* pmd12 thread started. Currently only 1 pmd thread exists.
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
* pmd14 thread started. 2 pmd threads exists.
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'

We have:
core 1 --> port 1, port 2
core 2 --> port 2

Fix this by starting pmd threads only after all of them have
been configured.

Cc: Daniele Di Proietto <diproiettod at vmware.com>
Cc: Dyasly Sergey <s.dyasly at samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>

kmod-traffic: Expand sanity tests.

The initial sanity test only checked IPv4 without IP fragments. This patch
adds additional tests using IPv6 and VLANs with IP fragments and expands
the existing test to be more strict.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>

treewide: Fix doubled "the".

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>

ovs-ofctl: Refine documentation of Geneve option mapping.

The text didn't say how to actually match on them. I took the liberty of
massaging the text a little further, too.

Suggested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>

ofproto-dpif: Use a regular ref instead of try_ref for rule translation.

Until now, flow translation has had to use try_ref to take a reference on
a rule, because a competing thread might have released the last reference
and done an RCU-postponed deletion. Since classifier versioning was
introduced, however, the release of the last reference is itself
RCU-postponed, which means that it is always safe to take the reference
directly.

Changing try_ref to ref means that taking a reference can't fail, which
allows the caller to take a reference in cases where the need to take a
reference was previously passed along a call chain, which simplifies some
code.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ovn: Change strategy for tunnel keys.

Until now, OVN has used "flat" tunnel keys, in which the STT tunnel key or
Geneve VNI contains a logical port number.  Logical port numbers are unique
within an OVN deployment.

Flat tunnel keys have the advantage of simplicity.  However, for packets
that are destined to logical ports on multiple hypervisors, they require
sending one packet per destination logical port rather than one packet per
hypervisor.  They also make it hard to integrate with VXLAN-based hardware
switches, which use VNIs to designate logical networks instead of logical
ports.

This commit switches OVN to a different scheme.  In this scheme, in Geneve
the VNI designates a logical network and a Geneve option specifies the
logical input and output ports, which are now scoped within the logical
network rather than globally unique.  In STT, all three identifiers are
encoded in the tunnel key.

To allow for the reduced amount of traffic for packets destined to logical
ports on multiple hypervisors, this commit also introduces the concept
of a logical multicast group.  The membership of these groups can be set
using a new Multicast_Group table in the southbound database (and
ovn-northd does use it starting in this commit).

With multicast groups alone, it would be difficult to implement ACLs,
because an ACL might disallow only some of the packets being sent to
a remote hypervisor.  Thus, this commit also splits the OVN logical
pipeline into two pipelines: the "ingress" pipeline, which makes the
decision about the logical destination of a packet as a set of logical
ports or multicast groups, and the "egress" pipeline, which runs on the
destination hypervisor with the multicast group destination exploded into
individual ports and makes a final decision on whether to deliver the
packet.  The "egress" pipeline can efficiently apply ACLs.

Until now, the OVN logical and physical pipeline implementation was not
adequately documented.  This commit adds extensive documentation to
the OVN manpages to cover these issues.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ofctrl: Negotiate OVN Geneve option.

This won't really get used until the next commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

rule: Introduce MFF_LOG_DATAPATH macro for consistency.

The other logical fields have their own macros, so the logical datapath
field might as well have one.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

actions: Allow caller to specify output table.

When an upcoming commit divides the pipeline up into ingress and egress
pipeline, it will become necessary to resubmit to different tables from
each of those pipelines to implement output. This commit makes that
possible.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ovn: Rename Pipeline table to Logical_Flow table.

The OVN pipeline is being split into two phases, which are most naturally
called "pipelines". I kept getting very confused trying to call them
anything else, and in the end it seems to make more sense to just rename
the Pipeline table.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ovn: Rename Binding table to Port_Binding.

An upcoming patch will add a Datapath_Binding table, so clarifying the
name seems useful.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

nroff: Add support for 'diagram' XML element for protocol headers.

This will be used in documentation for an upcoming change, to document
how Geneve OVN options are encoded.

The code in this change is from a series (not yet submitted) that makes
much more extensive use of it for documenting protocol headers.

Signed-off-by: Ben Pfaff <blp@nicira.com>

smap: New function smap_get_uuid().

To be used in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ovn-controller: Use controller_ctx just to pass around data.

Until now, controller_ctx has been a store of common state (although
the amount of data stored in it has declined to just database state).
I think it's clearer if we just use it as a way to pass data to
functions. This commit makes that change.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ovn-controller: Slightly adjust pipeline init and destroy for consistency.

This drops an unused parameter and groups the calls to these functions
with ofctrl_destroy() in each case.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

bridge: Reconfigure when system interfaces change.

Whenever system interfaces are removed, added or change state, reconfigure
bridge. This allows late interfaces to be added to the datapath when they are
added to the system after ovs-vswitchd is started.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

rtbsd: support RTM_IFANNOUNCE messages

When devices are created, they are announced using RTM_IFANNOUNCE messages using
PF_ROUTE.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

datapath: Fix STT protocol field for sampling packet.

Fixes typo in STT sampling code.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

ovn: Get/set lport type and options in ovn-nbctl.

A recent patch added "type" and "options" columns to the Logical_Port
table in OVN_Northbound. This patch allows you to get and set those
columns with ovn-nbctl.

ovn-nbctl should eventually get converted to use the common db-ctl
code that was recently added. When that happens, these commands can
just be removed.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ovn: Add type and options to logical port.

We have started discussing the use of the logical port abstraction in
OVN to represent special types of connections into an OVN logical
switch.  This patch proposes some schema updates to reflect these
special types of logical ports.  A logical port can have a "type" and
a set of options specific to that type.

Some examples of logical port types would be "vtep" for connectivity
to a VTEP gateway or "localnet" for a connection to a locally
accessible network via an ovs bridge.  Actualy support for these (or
other) types will come in later patches.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

smap: Add smap_equal().

Add a method to determine of two smaps are equal (have the exact same
set of key-value pairs).

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif-xlate: Fix mirroring interaction with recirculation.

Before this commit, mirroring state was not preserved across recirculation,
which could result in a packet being mirrored to the same destination both
before and after recirculation. This commit fixes the problem and adds a
test to avoid regression.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Add recirculation information to "ofproto/trace".

This makes it possible to understand what happens recirculation-wise in
translation.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofp-actions: Add action "debug_recirc" for testing recirculation.

It isn't otherwise useful and in fact hurts performance so it's disabled
without --enable-dummy.

An upcoming commit will make use of this.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-rid: Factor recirculation state out as new structure.

This greatly reduces the number of arguments to many of the functions
involved in recirculation, which to my eye makes the code clearer. It
will also make it easier to add new recirculation state in an upcoming
commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Rewrite mirroring to better fit flow translation.

Until now, mirroring has been implemented by accumulating, across the whole
translation process, a set of mirrors that should receive a mirrored
packet.  After translation was complete, mirroring restored the original
version of the packet and sent that version to the mirrors.

That implementation was ugly for multiple reasons.  First, it means that
we have to keep a copy of the original packet (or its headers, actually),
which is expensive.  Second, it doesn't really make sense to mirror a
version of a packet that is different from the one originally output.
Third, it interacted with recirculation; mirroring needed to happen only
after recirculation was complete, but this was never properly implemented,
so that (I think) mirroring never happened for packets that were
recirculated.

This commit changes how mirroring works.  Now, a packet is mirrored at the
point in translation when it becomes eligible for it: for mirrors based on
ingress port, this is at ingress; for mirrors based on egress port, this
is at egress.  (Duplicates are dropped.)  Mirroring happens on the version
of the packet as it exists when it becomes eligible.  Finally, since
mirroring happens immediately, it interacts better with recirculation
(it still isn't perfect, since duplicate mirroring will occur if a packet
is eligible for mirroring both before and after recirculation; this is
not difficult to fix and an upcoming commit later in this series will do so).

Finally, this commit removes more code from xlate_actions() than it adds,
which in my opinion makes it easier to understand.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Drop packets received from mirror output ports earlier.

Packets should never be received on mirror output ports.  We drop them
when we do receive them.  But by putting them through the processing that
we did until now, we made it possible for MAC learning, etc. to happen
based on these packets.  This commit drops them earlier to prevent that.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Move initialization of 'in_port' closer to first use.

This seems to be a little clearer to me.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Move 'nf_output_iface' from xlate_out to xlate_ctx.

This member is used internally during translation but none of the callers
used as an output of translation.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Remove multiple members from struct xlate_out.

Nothing used them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Move 'mirrors' from xlate_out to xlate_ctx.

Nothing outside of ofproto-dpif-xlate.c referenced this member.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Set up 'base_flow' when we initialize 'ctx'.

The initialization of 'base_flow' was previously split into a few pieces,
and I think it's easier to understand if it's all in one place.

This also moves and rewrites the comment describing 'base_flow'. I think
that the perspective of the new comment is a little more useful.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Clean up sFlow and IPFIX sampling code.

This code was a twisty maze of tiny functions, but what it actually needed
to do was simple. This makes it look that simple.

Among more stylistic changes, this removes 'user_cookie_offset' from
xlate_ctx. This member was used to communicate between two sections of
code that are both in xlate_actions() and close together, so it's better to
simply use a local variable than to put it into a shared context structure.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Factor wildcard processing out of xlate_actions().

I think that this makes xlate_actions() easier to read.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

tunnel: Break tnl_xlate_init() into two separate functions.

It seems to me that tnl_xlate_init() has two almost-separate tasks.  First,
it marks most of the 'wc' bits for tunnels.  Second, it checks and updates
ECN bits.  This commit breaks tnl_xlate_init() into two separate functions,
one for each of those tasks.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Simplify 'sample_actions_len' calculation.

It's always the size of 'odp_actions' following adding the sample actions.

This is a stylistic change that should not change behavior.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Move declaration of 'orig_flow' near its first use.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Eliminate 'is_icmp' from xlate_actions().

This is only used in one place and action processing can't change the
result, so only calculate it where it's needed.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Simplify invocation of process_special().

This takes advantage of common properties of the invocation of this
function in both callers (both supply the same 'flow' and 'packet',
although they write it differently) and avoids the need for a local
variable in each place.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Eliminate 'rule' local variable.

This variable was only used as a temporary within a small scope, so it
worked just as well to just use ctx.rule there instead.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofp-util: Fix group desc request encoding.

Signed-off-by: Minoru TAKAHASHI <takahashi.minoru7@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

openflow: Add OpenFlow1.5 group desc request.

Signed-off-by: Minoru TAKAHASHI <takahashi.minoru7@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofp-util: Fix port desc request encoding.

Signed-off-by: Minoru TAKAHASHI <takahashi.minoru7@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

openflow: Add OpenFlow1.5 port desc request.

Signed-off-by: Minoru TAKAHASHI <takahashi.minoru7@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif-xlate: Calculate 'ofpacts' in more restricted scope.

This moves the calculation of 'ofpacts' closer to its actual use, which
in my opinion makes the code easier to read.

This commit also expands the circumstances in which OVS omits sending
NetFlow records from those where there is exactly one OpenFlow action that
sends to controller, to those where any OpenFlow action sends to
controller. I doubt that this is a big deal.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Make xlate_actions() caller supply action buffer.

Until now, struct xlate_out has embedded an ofpbuf for actions and a large
stub for it, which xlate_actions() filled in during the flow translation
process.  This commit removes the embedded ofpbuf and stub, instead putting a
pointer to an ofpbuf into struct xlate_in, for a caller to fill in with a
pointer to its own structure if desired.  (If none is supplied,
xlate_actions() uses an internal scratch buffer and destroys it before
returning.)

This commit eliminates the last large data structure from
struct xlate_out, making the initialization of an entire xlate_out at
the beginning of xlate_actions() now reasonable.  More members will be
eliminated in upcoming commits, but this is no longer essential.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Make xlate_actions() caller supply flow_wildcards.

Until now, struct xlate_out has embedded a struct flow_wildcards, which
xlate_actions() filled in during the flow translation process (unless this
was disabled with xin->skip_wildcards, which in classifier microbenchmarks
saves significant time).  This commit removes the embedded flow_wildcards
and 'skip_wildcards', instead putting a pointer to a flow_wildcards into
struct xlate_in, for a caller to fill in with a pointer to its own
structure if desired.

One reason for this change is performance.  Until now, the userspace slow
path has done a full copy of a struct flow_wildcards for each upcall in
upcall_cb().  This commit eliminates that copy.  I don't know whether this
has a measurable performance impact; it may, because struct flow copies
had a noticeable cost in slow-path stress tests even when struct flow was
half its current size.

This commit also eliminates a large data structure from struct xlate_out,
reducing the cost of the initialization of that structure at the beginning
of xlate_actions().  However, there is more size reduction to come in
later commits.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif: Fix inaccurate wildcard output in ofproto/trace.

Until now, the ofproto/trace command has tried to accumulate wildcards
independently of flow translation. This was unnecessary, and in a few
cases where flow translation drops wildcards, it meant that ofproto/trace
printed inaccurate wildcards (because it keep the wildcards that flow
translation dropped).

This updates a test case whose output is now more accurate.

Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif-xlate: Initialize 'ctx' all in one place.

As I see it, this has two benefits. First, by using an initializer
rather than a series of assignment statements, the reader can be
assured that everything in the structure is actually initialized.
Second, previously the initialization of 'ctx' was scattered in
a few places in this function, which made it a little harder to be
sure that any given member was not just initialized but actually
initialized before the statement that one was looking at.

It's also nice to get rid of the stub members in xlate_ctx, since
nothing outside of xlate_actions() itself needs direct access to
them. (This is pretty much necessary if we're going to use an
initializer for struct xlate_ctx, because otherwise the compiler
would initialize the whole stub, which is too expensive.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofpbuf: New macro OFPBUF_STUB_INITIALIZER.

To be used in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

list: New macro OVS_LIST_POISON for initializing a poisoned list.

To be used in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-xlate: Initialize '*xout' all together at beginning.

To my mind, this is a good way to ensure that '*xout' gets initialized
properly in every execution. By using an initializer rather than a
series of assignment statements, we can be assured that every member
gets initialized.

This commit makes xlate_actions() more expensive because struct
xlate_out is large and this assignment will initialize all of it due to
C rules. Later commits will fix this up by removing all of the large
members, reducing xlate_out to only a few bytes total.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

type-props: Suppress warnings in newer Clang and GCC.

Until now, Clang 3.7+ and sufficiently new versions of GCC complained about
TYPE_MAXIMUM(int), etc., because it shifts a negative value. This commit
fixes the problem.

This commit also gives these macros sensible definitions for _Bool, and
documents all of them.

Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>

AUTHORS: Add Alexander Duyck.

Signed-off-by: Joe Stringer <joestringer@nicira.com>

datapath: Use eth_proto_is_802_3.

Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto).

Backport of upstream commit 6713fc9b8fa33444aa000f0f31076f6a859ccb34:
"openvswitch: Use eth_proto_is_802_3"

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

datapath: Backport eth_proto_is_802_3().

Backport of upstream commit 2c7a88c252bf3381958cf716f31b6b2e0f2f3fa7:
"etherdev: Fix sparse error, make test usable by other functions"

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

datapath: Use skb_postpull_rcsum().

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

datapath: Constify netlink structs.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

datapath: Whitespace fixes.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

rhel: Add buildrequires for libtool and automake.

Those two packages are needed to build but they might not
be present in the system.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>