Chris Mi [Mon, 12 Nov 2018 02:08:38 +0000 (11:08 +0900)]
netdev-tc-offloads: Delete ufid tc mapping in the right place
Currently, the ufid tc mapping is deleted in add_ufid_tc_mapping().
But if tc_replace_flower() failed, the old ufid tc mapping will not
be deleted. If another thread adds the same tc mapping successfully,
then there will be multiple mappings for the same ifindex, handle
and prio.
Fixes: 9116730db ("netdev-tc-offloads: Add ufid to tc/netdev map") Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Timothy Redaelli [Sun, 11 Nov 2018 10:04:17 +0000 (11:04 +0100)]
ipsec: Install ovs-monitor-ipsec in script directory
In commit d5cc46e3d185 ("ipsec: Use @PYTHON@ directly instead of
"/usr/bin/env python"") ovs-monitor-ipsec is installed in bin directory,
but it's supposed to be installed in script directory.
This commit removes also the manual copy of "ovs-monitor-ipsec" in spec file
since it's installed directly in "make install".
Fixes: d5cc46e3d185 ("ipsec: Use @PYTHON@ directly instead of "/usr/bin/env python"") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Timothy Redaelli [Sun, 11 Nov 2018 10:04:33 +0000 (11:04 +0100)]
gitignore: Ignore ovs-monitor-ipsec
Commit d5cc46e3d185 ("ipsec: Use @PYTHON@ directly instead of "/usr/bin/env
python"") introduced ovs-monitor-ipsec.in that generates
ovs-monitor-ipsec.
This commit adds ovs-monitor-ipsec to ipsec/.gitignore.
Fixes: d5cc46e3d185 ("ipsec: Use @PYTHON@ directly instead of "/usr/bin/env python"") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 12 Nov 2018 12:20:39 +0000 (15:20 +0300)]
pinctrl: Fix dp_packet structure leak.
Buffered packets are always packets created by 'dp_packet_clone_data()'
i.e. they are malloced. It's not enough to free the packet data,
dp_packet structure must be freed too. 'dp_packet_delete()' will take
care of that.
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Fixes: d7abfe39cfd2 ("OVN: add buffering support for ip packets") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 12 Nov 2018 12:19:57 +0000 (15:19 +0300)]
pinctrl: Fix crash on buffered packets hmap double remove.
'destroy_buffered_packets()' removes the hmap node which was
already removed by 'HMAP_FOR_EACH_POP()' producing following
crash log:
Invalid read of size 8
at 0x134EDB: hmap_remove (hmap.h:287)
by 0x134EDB: destroy_buffered_packets (pinctrl.c:237)
by 0x13AB3B: destroy_buffered_packets_map (pinctrl.c:246)
by 0x13AB3B: pinctrl_destroy (pinctrl.c:1804)
by 0x12C0CF: main (ovn-controller.c:916)
Address 0x8 is not stack'd, malloc'd or (recently) free'd
Could be captured by check-valgrind on the following test:
'2720. ovn -- IP packet buffering'
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Fixes: d7abfe39cfd2 ("OVN: add buffering support for ip packets") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Eelco Chaudron [Mon, 12 Nov 2018 09:26:22 +0000 (04:26 -0500)]
netdev-dpdk: Bring link down when NETDEV_UP is not set
When the netdev link flags are changed, !NETDEV_UP, the DPDK ports are not
actually going down. This is causing problems for people trying to bring
down a bond member. The bond link is no longer being used to receive or
transmit traffic, however, the other end keeps sending data as the link
remains up.
With OVS 2.6 the link was brought down, and this was changed with commit 3b1fb0779. In this commit, it's explicitly mentioned that the link down/up
DPDK APIs are not called as not all PMD devices support it.
However, this patch does call the appropriate DPDK APIs and ignoring
errors due to the PMD not supporting it. PMDs not supporting this should
be fixed in DPDK upstream.
I verified this patch is working correctly using the
ovs-appctl netdev-dpdk/set-admin-state <port> {up|down} and
ovs-ofctl mod-port <bridge> <port> {up|down} commands on a XL710
and 82599ES.
Timothy Redaelli [Sat, 10 Nov 2018 15:52:01 +0000 (16:52 +0100)]
rtnetlink: Remove executable bit from rtnetlink.h
In commit 135ee7ef362f ("rtnetlink: extend parser to include kind of master and
slave") the file mode of rtnetlink.h accidentaly changed from 0644 to 0755.
This commit restores the previous file mode (0644) on rtnetlink.h.
CC: John Hurley <john.hurley@netronome.com> Fixes: 135ee7ef362f ("rtnetlink: extend parser to include kind of master and slave") Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Timothy Redaelli [Sat, 10 Nov 2018 15:52:00 +0000 (16:52 +0100)]
bond: Remove executable bit from bond.c
In commit 90061ea7d1dd ("bond: Fix LACP fallback to active-backup when recirc
is enabled.") the file mode of bond.c accidentaly changed from 0644 to 0755.
This commit restores the previous file mode (0644) on bond.c.
CC: Ben Pfaff <blp@ovn.org> Fixes: 90061ea7d1dd ("bond: Fix LACP fallback to active-backup when recirc is enabled.") Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Timothy Redaelli [Sat, 10 Nov 2018 15:29:07 +0000 (16:29 +0100)]
ipsec: Use @PYTHON@ directly instead of "/usr/bin/env python"
Using "/usr/bin/env" is against Fedora Packaging Guidelines [1].
Moreover, in this specific case, it also prevent "make rpm-fedora" to
successfully complete on "Fedora Rawhide" since "#!/usr/bin/env python"
must not be used anymore [2].
This patch adds IPsec support for OVN tunnel. Basically, OVN offers a
binary option to its user for encryption configuration. If the IPsec
option is turned on, all tunnels will be encrypted. Otherwise, no tunnel
will be encrypted.
The changes are summarized as below:
1) Added a ipsec column on the NB_Global table and SB_Global table. The
value of ipsec column is propagated by ovn-northd from NB_Global to
SB_Global.
2) ovn-controller monitors the ipsec column in SB_Global. If the ipsec
value is true, ovn-controller sets options of the tunnel interface by
specifying "options:remote_name=<remote_chassis_name>". If the ipsec
value is false, ovn-controller removes these options.
3) ovs-monitor-ipsec daemon
(https://mail.openvswitch.org/pipermail/ovs-dev/2018-June/348701.html)
monitors the tunnel interface options and configures IKE daemon
accordingly for IPsec encryption.
Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch reintroduces ovs-monitor-ipsec daemon that
was previously removed by commit 2b02d770 ("openvswitch:
Allow external IPsec tunnel management.")
After this patch, there are no IPsec flavored tunnels anymore.
IPsec is enabled by setting up the right values in:
1. OVSDB:Interface:options column;
2. OVSDB:Open_vSwitch:other_config column;
3. OpenFlow pipeline.
GRE, VXLAN, GENEVE, and STT IPsec tunnels are supported. LibreSwan and
StrongSwan IKE daemons are supported. User can choose pre-shared key,
self-signed peer certificate, or CA-signed certificate as authentication
methods.
Han Zhou [Thu, 8 Nov 2018 06:29:44 +0000 (22:29 -0800)]
ofproto.c: Handle the situation when ofp_port number exhausted.
When ofp_port number is exhausted, OFPP_NONE (65535) will be
returned by alloc_ofp_port(). In this case we should error out
instead of continue using 65535 as port number.
Using the invalid number causes unpredictable consequences:
2018-11-06T01:29:10.042Z|142103|dpif(ovs-vswitchd)|WARN|system@ovs-system: failed to add ovn-aded97-0 as port: Device or resource busy
2018-11-06T01:29:10.045Z|142104|bridge(ovs-vswitchd)|INFO|bridge br-int: added interface ovn-aded97-0 on port 65535
2018-11-06T01:29:11.479Z|142108|ofproto(ovs-vswitchd)|WARN|br-int: cannot configure bfd on nonexistent port 65535
2018-11-06T01:29:11.479Z|142109|ofproto(ovs-vswitchd)|WARN|br-int: cannot configure LLDP on nonexistent port 65535
2018-11-06T01:29:11.479Z|142110|ofproto(ovs-vswitchd)|WARN|br-int: cannot configure datapath on nonexistent port 65535
...
2018-11-06T01:29:18.783Z|142117|bfd(ovs-vswitchd)|INFO|ovn-aded97-0: BFD state change: admin_down->down "No Diagnostic"->"No Diagnostic".
2018-11-06T01:29:18.785Z|00061|bfd(monitor82)|INFO|Interface ovn-aded97-0 remote mult value 0 changed to 3
2018-11-06T01:29:18.785Z|00062|bfd(monitor82)|INFO|ovn-aded97-0: New remote min_rx.
...
2018-11-06T01:29:18.773Z|142111|bridge(ovs-vswitchd)|INFO|bridge br-int: deleted interface ovn-aded97-0 on port 65535
...
2018-11-06T01:29:18.779Z|142115|dpif(ovs-vswitchd)|WARN|system@ovs-system: failed to add ovn-aded97-0 as port: Device or resource busy
2018-11-06T01:29:18.782Z|142116|bridge(ovs-vswitchd)|INFO|bridge br-int: added interface ovn-aded97-0 on port 65535
...
2018-11-06T01:29:18.785Z|00064|bfd(monitor82)|WARN|ovn-aded97-0: Incorrect your_disc.
...
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Han Zhou [Thu, 8 Nov 2018 06:29:43 +0000 (22:29 -0800)]
ofproto.c: Fix port number leaking.
When there is an error in ofport_install(), the ofp port number is
not deallocated, which leads to port number leak. For example,
when there is an redundant tunnel port added in an OVS bridge,
ovs-vswitchd will try to add the port to ofproto whenever OVSDB
changes, which would trigger the port number leak, and over the
time there won't be any port available for valid requests.
Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 7 Nov 2018 21:44:34 +0000 (13:44 -0800)]
dns-resolve: Improve on handling of system DNS nameserver
This patch enables OVS on windows to read system nameserver configuration.
In addition, a new environment variable OVS_RESOLV_CONF is introduced.
If set, it can be used as DNS server configuration file. This variable
is supposed to be used for sandboxing other things. It is documented
accordingly.
Suggested-by: Ben Pfaff <blp@ovn.org> Suggested-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 7 Nov 2018 21:44:33 +0000 (13:44 -0800)]
dns-resolve: Stop dns resolving if no DNS server configured
DNS resolution should fail if no DNS servers are available. This
patch fixes it.
Suggested-by: Ben Pfaff <blp@ovn.org> Suggested-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 7 Nov 2018 20:42:16 +0000 (12:42 -0800)]
ofctl_parse_target: Avoid passing invalid ofputil_protocol to ofputil_protocol_to_ofp_version
In this test, the involved ovs functions expect valid ofputil_protocol
values. Therefore, if usable_protocols is invalid, we should return.
Otherwise, ovs will abort.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11165 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 7 Nov 2018 20:42:17 +0000 (12:42 -0800)]
odp-util: Set a limit for nested parse_odp_key_mask_attr call
This patch puts a limit on the nested depth in flow key string to avoid
stackoverflow. An example to show this issue is a key string contains
thousands of nested encaps. In addition, a new test is added for this fix.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11149 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Wed, 7 Nov 2018 20:42:15 +0000 (12:42 -0800)]
actions: Enforce a maximum limit for nested action depth
If nested depth of actions is too deep, then the stack will be overflown
and ovs-vswitch crashes. This patch prevents this by adding a depth limit
to nested actions.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11237 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Lorenzo Bianconi [Fri, 26 Oct 2018 16:25:59 +0000 (18:25 +0200)]
OVN: configure L2 address according to the used IP address
Configure L2 dynamic address according to used IPv4 address.
This patch allows to define a deterministic relationship between
L2 and L3 addresses when dynamic IPAM is used.
This patch allows to fix a possible L2/L3 address mismatch than can
occur when pods are created and destroyed at high rate [1] since if
there is no relation between MAC and IP addresses ARP cache can be
poisoned with a wrong correspondence
Lorenzo Bianconi [Fri, 26 Oct 2018 16:25:58 +0000 (18:25 +0200)]
OVN: assign new addresses at the end of build_ipam routine
Visit all ovn datapaths before adding new dynamic addresses to the
system in order to avoid possible L2 address duplication when
the same MAC address is configured on different ovn logical switches.
Current implementation can miss the duplicated address since macam
is cleared at each ovn run and there is no guarantee on visit order
of ovn datapath hash table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Lorenzo Bianconi [Fri, 26 Oct 2018 16:20:44 +0000 (18:20 +0200)]
OVN: introduce mac_prefix support to IPAM
Add the possibility to specify a given mac address prefix for
dynamically generated mac address. Mac address prefix can be
specified in nbdb NB_Global table, options:mac_prefix=<mac_prefix>
This patch fix a possible issue of L2 address duplication if
multiple OVN deployments share a single broadcast domain
Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Fri, 2 Nov 2018 18:25:45 +0000 (11:25 -0700)]
ofproto-dpif-upcall: Don't purge ukeys while in a quiescent state.
revalidator_purge() iterates and modifies umap->cmap. This should
not happen in quiescent state, because cmap implementation based
on rcu protected variables. Let's narrow the quiescent period
to avoid possible wrong memory accesses.
CC: Joe Stringer <joe@ovn.org> Fixes: 9fce0584a643 ("revalidator: Use 'cmap' for storing ukeys.") Reported-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
When processing icmp unreachable message for erspan tunnel, tunnel id
should be erspan_net_id instead of ipgre_net_id.
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu<u9012063@gmail.com> Signed-off-by: Haishuang Yan<yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu<u9012063@gmail.com> Signed-off-by: David S. Miller<davem@davemloft.net> Fixes: 8e53509c ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 1 Nov 2018 17:33:03 +0000 (10:33 -0700)]
odp-util: Validate close-brace in scan_geneve and fix return values of san_xxx functions
This patch adds validation of close-braces in scan_geneve. An simple
example is "set(encap(tunnel(geneve({{))))". When scan_geneve returns,
(struct geneve_scan *key)->len equals to 2*sizeof(struct geneve_opt).
That seems not correct.
Found this issue while inspecting oss-fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11153.
In addition, SCAN_TYPE expects scan_XXX functions to return 0
on errors. This patch inspects all related scan_XXX functions
and fixes their return values.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Fri, 2 Nov 2018 11:41:01 +0000 (17:11 +0530)]
ovn-nbctl: Fix the ovn-nbctl test "LBs - daemon" which fails during rpm build
When 'make check' is called by the mock rpm build (which disables networking),
the test "ovn-nbctl: LBs - daemon" fails when it runs the command
"ovn-nbctl lb-add lb0 30.0.0.1a 192.168.10.10:80,192.168.10.20:80". ovn-nbctl
extracts the vip by calling the socket util function 'inet_parse_active()',
and this function blocks when libunbound function ub_resolve() is called
further down. ub_resolve() is a blocking function without timeout and all the
ovs/ovn utilities use this function.
As reported by Timothy Redaelli, the issue can also be reproduced by running
the below commands
$ sudo unshare -mn -- sh -c 'ip addr add dev lo 127.0.0.1 && \
mount --bind /dev/null /etc/resolv.conf && runuser $SUDO_USER'
$ make sandbox SANDBOXFLAGS="--ovn"
$ ovn-nbctl -vsocket_util:off lb-add lb0 30.0.0.1a \
192.168.10.10:80,192.168.10.20:80
To address this issue, this patch adds a new bool argument 'resolve_host' to
the function inet_parse_active() to resolve the host only if it is 'true'.
ovn-nbctl/ovn-northd will pass 'false' when it calls this function to parse
the load balancer values.
Zak Whittington [Fri, 2 Nov 2018 22:25:29 +0000 (15:25 -0700)]
documentation: man vswitchd.conf.db(5) updated flow-restore-wait
Commit 7ed73428a changed the behavior of flow-restore-wait to
also prevent the switch from connecting to controllers in the
controller table, but failed to update the man page documentation
generated by vswitchd/vswitch.xml to reflect this.
This commit adds that documentation.
Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
When there are both pop and push ethernet header actions among the
actions to be applied to a packet, an unexpected EINVAL (Invalid
argument) error is obtained. This is due to mac_proto not being reset
correctly when those actions are validated.
Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html Fixes: 91820da6ae85 ("openvswitch: add Ethernet push and pop actions") Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html Fixes: 6fcecb85ab ("datapath: add Ethernet push and pop actions") Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 1 Nov 2018 15:06:32 +0000 (08:06 -0700)]
checkpatch: Speed up checking when spell checking not enabled.
On my machine it takes almost a second for enchant to read its dictionary.
This time is wasted when spell checking is not enabled. This commit makes
checkpatch read the dictionary only when it will be used.
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 1 Nov 2018 22:05:31 +0000 (15:05 -0700)]
ofp-actions: Let parse_UNROLL_XLATE return error message instead of aborting program
Currently, if unroll_xlate is passed to ovs-ofctl as one of actions,
let say 'ovs-ofctl add-flow br0 in_port=1,actions=unroll_xlate',
ovs-ofctl will crash. This patch fixes it by returning an error
message.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11184 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 1 Nov 2018 18:39:59 +0000 (11:39 -0700)]
oss-fuzz: Free error string in ofctl_parse_flow
This patch frees the leaked error string to stop oss-fuzz from
complaining.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11161 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 1 Nov 2018 18:51:21 +0000 (11:51 -0700)]
oss-fuzz: Use unsigned for left shift in ofctl_parse_flows__
Left shift int (1 here) can result in a negative value. This is an undefined
behavior according to ISO C99 (6.5.7).
The error message reported by oss-fuzz is:
runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
This patch fixes it by changing signed int to unsigned int.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11166 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Tiago Lam [Fri, 2 Nov 2018 09:06:34 +0000 (09:06 +0000)]
dp-packet: Fix allocated size on DPDK init.
When enabled with DPDK OvS deals with two types of packets, the ones
coming from the mempool and the ones locally created by OvS - which are
copied to mempool mbufs before output. In the latter, the space is
allocated from the system, while in the former the mbufs are allocated
from a mempool, which takes care of initialising them appropriately.
In the current implementation, during mempool's initialisation of mbufs,
dp_packet_set_allocated() is called from dp_packet_init_dpdk() without
considering that the allocated space, in the case of multi-segment
mbufs, might be greater than a single mbuf. Furthermore, given that
dp_packet_init_dpdk() is on the code path that's called upon mempool's
initialisation, a call to dp_packet_set_allocated() is redundant, since
mempool takes care of initialising it.
To fix this, dp_packet_set_allocated() is no longer called after
initialisation of a mempool, only in dp_packet_init__(), which is still
called by OvS when initialising locally created packets.
Mark Kavanagh [Fri, 2 Nov 2018 09:06:33 +0000 (09:06 +0000)]
dp-packet: Init specific mbuf fields.
dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.
This patch ensures that the following mbuf fields are initialized to
appropriate values on creation of a new dp_packet:
- ol_flags=0
- nb_segs=1
- tx_offload=0
- packet_type=0
- next=NULL
Adapted from an idea by Michael Qiu <qiudayu@chinac.com>:
https://patchwork.ozlabs.org/patch/777570/
Mark Kavanagh [Fri, 2 Nov 2018 09:06:32 +0000 (09:06 +0000)]
netdev-dpdk: fix mbuf sizing
There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
buffer alignment (typically 1024B). So, for example, in order to
successfully receive and capture a 1500B packet, mbufs with a
data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
* the dp packet, which includes a struct rte mbuf (704B)
* RTE_PKTMBUF_HEADROOM (128B)
* packet data (aligned to 1k, as previously described)
* RTE_PKTMBUF_TAILROOM (typically 0)
Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:
mbuf.data_len == mbuf.buf_len - mbuf.data_off
This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.
Co-authored-by: Tiago Lam <tiago.lam@intel.com> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") CC: Santosh Shukla <santosh.shukla@caviumnetworks.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Tiago Lam <tiago.lam@intel.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ian Stokes [Thu, 25 Oct 2018 11:03:57 +0000 (12:03 +0100)]
netdev-dpdk: Add link speed to get_status().
Report the link speed of the device in netdev_dpdk_get_status()
function.
Link speed is already reported as part of the netdev_get_features()
function. However only link speeds defined in the OpenFlow specs are
supported so speeds such as 25 Gbps etc. are not shown. The link
speed for the device is available in Mbps in rte_eth_link.
This commit converts the link speed for a given dpdk device to an
easy to read string and reports it in get_status().
Ian Stokes [Wed, 24 Oct 2018 10:35:17 +0000 (11:35 +0100)]
netdev-dpdk: Fix netdev_dpdk_get_features().
This commit fixes netdev_dpdk_get_features() by initializing a bitmap
that represents current features to zero and accounting for non defined
link speed values in the OpenFlow spec.
The current approach for retrieving netdev dpdk features uses a
pointer allocated in the stack without being initialized. As such there
is no guarantee that the bitmap will be accurate. Fix this by declaring
and initializing local variable 'feature' to be used when building the
bitmap, with its value then assigned to the pointer. Also account for
link speeds not defined in the OpenFlow spec by defaulting to
NETDEV_F_OTHER for undefined link speeds.
Ilya Maximets [Wed, 31 Oct 2018 15:44:09 +0000 (18:44 +0300)]
dpif-netdev: End the quiescent state for flow offloading thread.
Flow offloading thread uses concurrent hash maps which are
based on rcu protected variables. It must use them while in
active state. Working in a quiescent state could cause
segmentation faults because of possible cmap internal
structure changes.
Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread") Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ian Stokes [Thu, 25 Oct 2018 16:50:44 +0000 (17:50 +0100)]
Docs: Remove HWOL DPDK limitation.
Partial offload support was added to OVS DPDK in OVS 2.10. As such
remove the limitation that OVS DPDK does not support HWOL from the
DPDK install documentation.
Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ian Stokes [Fri, 19 Oct 2018 13:30:15 +0000 (14:30 +0100)]
Docs: Remove zero-copy QEMU limitation.
Remove note regarding zero-copy compatibility with QEMU >= 2.7.
When zero-copy was introduced to OVS it was incompatible with QEMU >=
2.7. This issue has since been fixed in DPDK with commit 803aeecef123 ("vhost: fix dequeue zero copy with virtio1") and
backported to DPDK LTS branches. Remove the reference to this
issue in the zero-copy documentation.
Ilya Maximets [Fri, 19 Oct 2018 13:51:15 +0000 (16:51 +0300)]
netdev-dpdk: Dump flow patterns only if debug enabled.
No need to waste time for fields checking in case DBG disabled.
Additionally sequence of prints replaced with single print
to avoid output interrupting by other log messages.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ilya Maximets [Thu, 18 Oct 2018 13:29:21 +0000 (16:29 +0300)]
netdev-dpdk: Secure flow offload API.
rte API is not thread safe. We have to get netdev mutex
before uing it and also before using fields of netdev structure.
This is important because offload API used from the separate
thread and could be used at the same time with other netdev
functions called from the main thread.
CC: Finn Christensen <fc@napatech.com> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ilya Maximets [Thu, 18 Oct 2018 13:29:20 +0000 (16:29 +0300)]
netdev-dpdk: Drop offload API for vhost ports.
vhost ports are not DPDK eth ports and has no rte_flow API.
Stop calling this API with DPDK_ETH_PORT_ID_INVALID to
avoid time wasting and errors in log.
Additionally, DPDK_FLOW_OFFLOAD_API definition moved to .c
file, because there is no need to expose it in header.
CC: Finn Christensen <fc@napatech.com> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Ben Pfaff [Wed, 24 Oct 2018 21:23:38 +0000 (14:23 -0700)]
connmgr: Improve interface for setting controllers.
Using an shash instead of an array simplifies the code for both the caller
and the callee. Putting the set of allowed OpenFlow versions into the
ofproto_controller data structure also simplifies the overall function
interface slightly.
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 25 Oct 2018 17:34:41 +0000 (10:34 -0700)]
connmgr: Modernize coding style.
This moves declarations closer to first use and merges them with
initialization when possible, moves "for" loop variable declarations into
the "for" statements where possible, and otherwise makes this code look
like it was written a little more recently than it was.
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
dpif: Restore a few lines with form feed characters
A few lines with form feed characters (ASCII: ^L) were accidentally
deleted by a recent commit to support rebalancing of offloaded flows.
This patch reverts those lines.
Ben Pfaff [Tue, 30 Oct 2018 22:03:17 +0000 (15:03 -0700)]
ovn-northd: Improve hashing for chassis queues.
The key for a "struct ovn_chassis_qdisc_queues" is a Chassis UUID and a
queue_id, but only the UUID was being hashed, so if there was more than one
per chassis then they'd all end up in the same hash bucket, which is
needlessly inefficient. (And if there's only one per chassis then why do
we bother allocating them at all?)
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Numan Siddique <nusiddiq@redhat.com>
Yi-Hung Wei [Tue, 30 Oct 2018 20:47:25 +0000 (13:47 -0700)]
ovs-lib.in: Remove unnecessary conntrack flush
We introduced flush-conntrack in force-reload-kmod script by commit 8bea39b186ca ("datapath: Prevent panic") to prevent kernel panic.
It turns out that the kernel panic is actually triggered by the
IPv4 secret timer, and it is fixed by commit 121905984724 ("compat: Initialize IPv4 reassembly secret timer").
This commit removes the unnecessary conntrack flush in the script.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> CC: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Jianbo Liu [Mon, 29 Oct 2018 08:29:41 +0000 (08:29 +0000)]
dpif-netlink: Don't destroy and recreate port if it exists
In commit 7521e0cf9e ('ofproto-dpif: Let the dpif report when a port is
a duplicate'), the checking of port existence before adding was removed,
and it's up to the dpif to check if port exists and add only if needed.
But the port can't be added to datapath if already exists. Then it will
be destroyed and created again. This causes problem because configuration
may miss. For example, if creating two vxlan on the same port, its ingress
qdisc will be lost after recreated.
Fixes: 7521e0cf9e88 ("ofproto-dpif: Let the dpif report when a port is a duplicate.") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
lib/ofp-table.c:1454:42: error: \
format specifies type 'unsigned char' but the argument has type 'int'
ds_put_format(s, "\n table %"PRIu8, table);
~~ ^~~~~
CC: Ben Pfaff <blp@ovn.org> Fixes: b47e7e2bac7f ("ofp-table: Always format the table number in table features.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ilya Maximets [Mon, 29 Oct 2018 14:46:33 +0000 (17:46 +0300)]
manpages: Include ovs.tmac in most man roots.
This allows to not redefine common macroses in every single
file and allowes using things like .EX without warying about
compatibility.
manpages.mk updated automatically.
Files that are already complete pages (i.e. has no *.in sources)
wasn't touched, because this will require additional file
manipulations and changes in makefiles/specs without serious
profit.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff [Thu, 30 Aug 2018 18:03:12 +0000 (11:03 -0700)]
ofp-table: Always format the table number in table features.
Table features should indicate the table number as well as the table
name. Before this, the first line for each table looked like this:
table myname ("myname"):
but it's more useful if it's:
table 123 ("myname"):
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Ben Pfaff [Mon, 27 Aug 2018 22:40:35 +0000 (15:40 -0700)]
ofp-table: Ignore bits that have to change according to OpenFlow.
OpenFlow table feature replies contain a per-table bitmap that indicates
which tables a flow can point to in goto_table actions. OpenFlow requires
that a table only be able to go to higher-numbered tables. This means that
a switch that is general as possible will always have different features
for every table, since each one will have a different bitmap. This makes
the output of "ovs-ofctl dump-table-features" pretty long and ugly because
it has about 250 entries like this:
table %d:
metadata: match=0xffffffffffffffff write=0xffffffffffffffff
max_entries=%d
instructions (table miss and others):
next tables: %d-253
(same instructions)
(same actions)
(same matching)
This commit changes the logic that prints table features messages so that
it considers two sequentially numbered tables to be the same if only the
bit that necessarily must be tunred off changes. This reduces the hundreds
of entries above to just:
tables 1...253: ditto
which is so much more readable.
Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
Zak Whittington [Fri, 26 Oct 2018 22:06:28 +0000 (15:06 -0700)]
ofp-msgs: Added ONF_ and NXT_REQUESTFORWARD for OF1.0-1.3
Backported OFPT14_REQUESTFORWARD to OF1.0-1.3.
OF 1.0-1.2 use an NXT Nicira extension while OF 1.3
uses an ONF extension (ONF version is specified in a
previously published ONF spec sheet).
Includes ofp-print tests for multiple inner message
types, and multiple OF versions including the NXT and ONF.
Also includes more end-to-end ofproto tests for both
NXT OF1.0 and also ONF OF1.3.
VMware-BZ: 2136594 Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 25 Oct 2018 21:41:50 +0000 (14:41 -0700)]
NSH: Fix NSH-related length macros that cause stack overflow
In the filed of ver_flags_ttl_len of struct nshhdr, there are only 6
bits that are used to indicate header's total length in 4-byte words.
Therefore, the max value for total is 252 (63x4), instead of 256 used
in present code base. This patch fixes it.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10855 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 25 Oct 2018 21:49:14 +0000 (14:49 -0700)]
odp-util: Properly handle the return values of scan_XXX functions
Functions like scan_u8, return 0 when they failed to scan the expected
values. Function scan_geneve failed to check this situation. This leads
to using of uninitialized value of opt_len_mask. This patch fixes it
and further inspects and fixes all the problematic places where
the return values of scan_XXX functions are not properly handled.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10800 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Yifeng Sun [Thu, 25 Oct 2018 23:17:23 +0000 (16:17 -0700)]
ofctl_parse_target: Only parse complete ofputil_flow_mod data.
When parse_ofp_flow_mod_str returns error, `fm` is incomplete and pointers
in it may be null, e.g. fm.match.flow. In this case, passing it to
ofctl_parse_flows__ may cause pointer errors because ofctl_parse_flows__
expects a valid input of type struct ofputil_flow_mod.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11110 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Zak Whittington [Thu, 25 Oct 2018 18:09:09 +0000 (11:09 -0700)]
bridge.c: prevent controller connects while flow-restore-wait
When force-reload-kmod is used, it shows an error when reinstalling
tlvs during "Restoring saved flows" step:
OFPT_ERROR (xid=0x4): NXTTMFC_ALREADY_MAPPED
This is caused by a race condition between the restore script,
which calls ofctl, and the connected controllers both adding back
the same TLVs.
The restore script already sets flow-restore-wait to true while
doing flow restoration, and sets it back to false after it is
done, and this patch utilizes that fact to prevent the TLV race.
It does this by preventing vswitchd from connecting to
controllers in the controller table while it is in a
flow-restore-wait state.
With this patch, when bridge_configure_remotes() calls
bridge_get_controllers(), it first checks if flow-restore-wait
has been set, and if so, it ignores any controllers in the
controller database and sets n_controllers to 0.
This solution does preserve the management service controller
which is added via bridge_ofproto_controller_for_mgmt() after
checking whether we should call bridge_get_controllers()
(and thus n_controllers is properly set to 1, etc)
VMware-BZ: 2195377 Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
Bhargava Shastry [Mon, 15 Oct 2018 09:23:33 +0000 (11:23 +0200)]
ossfuzz: Add ofctl parse target
This patch adds a new target called ofctl_parse_target to
ossfuzz. The main idea is to begin to fuzz APIs from the ofctl utility
program. At a later point, these may be added. For the moment, this patch
only fuzzes APIs that parse flow mod commands.
This target is demonstrably capable of finding memory corruption defects
in the parsing path. To aid the fuzzing process, a dictionary file
containing tokens specific to this parsing path have been added.
Signed-off-by: Bhargava Shastry <bshastry@sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
Numan Siddique [Thu, 18 Oct 2018 11:17:05 +0000 (16:47 +0530)]
connmgr: Fix vswitchd abort when a port is added and the controller is down
We see the below trace when a port is added to a bridge and the configured
controller is down
0x00007fb002f8b207 in raise () from /lib64/libc.so.6
0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
0x00007fb004953026 in ofputil_protocol_to_ofp_version () from /lib64/libopenvswitch-2.10.so.0
0x00007fb00494e38e in ofputil_encode_port_status () from /lib64/libopenvswitch-2.10.so.0
0x00007fb004ef1c5b in connmgr_send_port_status () from /lib64/libofproto-2.10.so.0
0x00007fb004efa9f4 in ofport_install () from /lib64/libofproto-2.10.so.0
0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
0x00007fb004efc7f9 in ofproto_port_add () from /lib64/libofproto-2.10.so.0
0x0000556d540a3f95 in bridge_add_ports__ ()
0x0000556d540a5a47 in bridge_reconfigure ()
0x0000556d540a9199 in bridge_run ()
0x0000556d540a02a5 in main ()
The abort is because of ofputil_protocol_to_ofp_version() is called with invalid
protocol - OFPUTIL_P_NONE. Please see [1] for more details. Similar aborts are
seen as reported in [2].
The commit [3] changed the behavior of the function rconn_get_version().
Before the commit [3], the function ofconn_receives_async_msg() would always
return false if the connection to the controller was down, since
rconn_get_version() used to return -1. This patch now checks the rconn
connection status in ofconn_receives_async_msg() and returns false if not
connected. This would avoid the aborts seen in the above stack trace.
The issue can be reproduced by running the test added in this patch
without the fix.