Ben Pfaff [Thu, 15 May 2014 22:52:17 +0000 (15:52 -0700)]
ofproto-dpif-upcall: Avoid use-after-free in revalidate() corner cases.
The loop in revalidate() needs to ensure that any data obtained from
dpif_flow_dump_next() is used before it is destroyed, as indicated by
dpif_flow_dump_next_may_destroy_keys(). In the common case, where
processing reaches the end of the main "while" loop, it does this, but
in two corner cases the code in the loop execute "continue;", which skipped
the check. This commit fixes the problem.
Bug #1249988. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>
Simon Horman [Thu, 15 May 2014 00:05:03 +0000 (09:05 +0900)]
datapath: sample action without side effects
The sample action is rather generic, allowing arbitrary actions to be
executed based on a probability. However its use, within the Open vSwitch
code-base is limited: only a single user-space action is ever nested.
A consequence of the current implementation of sample actions is that
depending on weather the sample action executed (due to its probability)
any side-effects of nested actions may or may not be present before
executing subsequent actions. This has the potential to complicate
verification of valid actions by the (kernel) datapath. And indeed adding
support for push and pop MPLS actions inside sample actions is one case
where such case.
In order to allow all supported actions to be continue to be nested inside
sample actions without the potential need for complex verification code
this patch changes the implementation of the sample action in the kernel
datapath so that sample actions are more like a function call and any side
effects of nested actions are not present when executing subsequent
actions.
With the above in mind the motivation for this change is twofold:
* To contain side-effects the sample action in the hope of making it
easier to deal with in the future and;
* To avoid some rather complex verification code introduced in the MPLS
datapath patch.
Some notes about the implementation:
* This patch silently changes the behaviour of sample actions whose nested
actions have side-effects. There are no known users of such sample
actions.
* sample() does not clone the skb for the only known use-case of the sample
action: a single nested userspace action. In such a case a clone is not
needed as the userspace action has no side effects.
Given that there are no known users of other nested actions and in order
to avoid the complexity of predicting if other sequences of actions have
side-effects in such cases the skb is cloned.
* As sample() provides a cloned skb in the unlikely case where there are
nested actions other than a single userspace action it is no longer
necessary to clone the skb in do_execute_actions() when executing a
recirculation action just because the keep_skb parameter is set: this
parameter was only set when processing the nested actions of a sample
action. Moreover it is possible to remove the keep_skb parameter of
do_execute_actions entirely.
* As sample() provides either a cloned skb or one that has had a
reference taken (using keep_skb) to do_execute_actions()
the original skb passed to sample() is never consumed. Thus the
caller of sample() (also do_execute_actions()) can use its generic
error handling to free the skb on error.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
tests: Change to parse dynamically allocated ports on windows.
In Windows, we use kernel assigned TCP port for ssl/tcp and
unixctl. In tests, we parse the log files of ovsdb-server.log,
test-sflow.log and test-netflow.log to get this port. In all
the above cases, tcp port is allocated first and then the unixctl port.
So a 'head -1' on the result should be safe.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Jarno Rajahalme [Thu, 15 May 2014 02:53:51 +0000 (19:53 -0700)]
lib/classifier: Simpilify array ordering.
The terminology we used for subtable ordering ('splice', 'right
before') was inherited from an earlier use of a linked list, and
turned out to be confusing when applied to an array. Also, we only
ever move one subtable earlier or later within the array, so we can
simplify the code as well.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Some ovsdb-tool related unit tests fail with bad checksum errors
while reading transactions from database. It is most likely because
of the CR at the end of line. Using binary mode solves the problem.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Ben Pfaff [Sat, 10 May 2014 01:16:38 +0000 (18:16 -0700)]
nx-match: Refactor nxm_put_ip() to handle all IPv4 and IPv6 fields.
Until now, some fields have been handled in the caller, and the caller has
been responsible for distinguishing ICMPv4 from ICMPv6. This
implementation seems to make the code a little easier to understand.
Ben Pfaff [Thu, 8 May 2014 06:18:46 +0000 (23:18 -0700)]
Implement OpenFlow 1.5 port desc stats request.
OpenFlow 1.4 and earlier always send the description of every port in
response to an OFPMP_PORT_DESC request. OpenFlow 1.5 proposes allowing
the controller to request a description of a single port. This commit
implements a prototype.
Ben Pfaff [Thu, 8 May 2014 06:49:00 +0000 (23:49 -0700)]
Implement OpenFlow 1.5 group desc stats request.
OpenFlow 1.4 and earlier always send the description of every group in
response to an OFPMP_GROUP_DESC request. OpenFlow 1.5 proposes allowing
the controller to request a description of a single group. This commit
implements a prototype.
Ben Pfaff [Fri, 9 May 2014 21:12:06 +0000 (14:12 -0700)]
ofp-util: Remove ofputil_get_phy_port_size().
The size is not fixed for OpenFLow 1.4 and later, so it's a little
deceptive to return any particular value. This function was only used in
one place, so move it inline there.
Ben Pfaff [Thu, 8 May 2014 06:35:35 +0000 (23:35 -0700)]
ofp-util: Reduce duplicate code.
ofputil_put_phy_port() and ofputil_append_port_desc_stats_reply() had a
lot of code duplication. This reduces it: it deletes some specialized
code from ofputil_put_phy_port(), moving it into its caller
ofputil_put_switch_features_port() that actually needed it. That change
then allows ofputil_append_port_desc_stats_reply() to become a simple
wrapper around ofputil_put_phy_port().
Ben Pfaff [Sat, 10 May 2014 02:29:56 +0000 (19:29 -0700)]
ofp-util: Generalize functions for parsing OF1.3+ properties.
The main effect is to move these functions a little earlier in the file.
Also, OpenFlow 1.4 changed the table-features specific error codes to new
values that apply to all property sets, so this commit updates the error
code names and adds the appropriate OpenFlow 1.4+ codes.
Ben Pfaff [Thu, 8 May 2014 04:39:00 +0000 (21:39 -0700)]
ofp-util: Remove ofputil_count_phy_ports().
It's harder to calculate the number of ports in a given amount of space in
OpenFlow 1.4 and later, because the ofp_port structure becomes variable
length in those versions. This commit removes the one caller, replacing
it by a version that doesn't need to know the number of ports in advance.
Ben Pfaff [Fri, 9 May 2014 04:20:22 +0000 (21:20 -0700)]
ovs-ofctl: Fix port lookup and "ovs-ofctl" behavior for OpenFlow 1.3+.
ovs-ofctl supports using port names in commands that operate on ports. It
does this by connecting to the switch, listing the ports, and picking out
the one with the specified name. However, this didn't work properly for
OpenFlow 1.3+, because it always used an OFPT_FEATURES_REQUEST to list the
ports, and in OpenFlow 1.3+ the reply to this request does not include a
list of ports. This commit fixes the problem (using code that previously
was just a fallback when there were too many ports to fit in an
OFPT_FEATURES_REPLY).
For similar reasons, "ovs-ofctl show" wasn't listing the switch's ports
when it connected to a switch over OpenFlow 1.3 or later. This commit
fixes that bug also.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Conflicts:
utilities/ovs-ofctl.c
Anoob Soman [Wed, 14 May 2014 13:32:16 +0000 (14:32 +0100)]
ofproto-dpif-xlate: Fix null pointer dereference
actions (in xlate_actions__) would be NULL when xlate_actions()
is called from packet_out()->ofproto_dpif_execute_actions().
This causes a NULL pointer to be dereferenced when
ctx.xbridge->netflow is set.
Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Tue, 13 May 2014 05:46:18 +0000 (14:46 +0900)]
datapath: Free skb(s) on recirculation error
This patch attempts to ensure that skb(s) are always freed (once)
if if an error occurs in execute_recirc(). It does so in two ways:
1. Ensure that execute_recirc() always consimes skb passed to it.
Specifically, free the skb if the call to ovs_flow_extract() fails.
2. Return from the recirc case in execute_recirc() whenever
the skb has not been cloned immediately above.
This is only the case if the action is both the last action and the
keep_skb parameter of execute_recirc is not true. As it is the last
action and the skb is consumed one way or another by execute_recirc() it
is correct to return here. In particular this avoids the possibility of
the skb, which has been consumed by execute_recirc() from being freed.
Conversely if this is not the case then the skb has been cloned
and the clone has been consumed by execute_recirc().
This leads to three sub-cases:
* If execute_recirc() returned an error code then the original skb
will be freed by the error handling code below the case statement in
do_execute_actions().
* If this is not the last action then action processing will continue,
using the original skb.
* If this is the last action then it must also be the case that keep_skb
is true (otherwise the skb would not have been cloned). Thus
do_execute_actions() will return without freeing the original skb.
Signed-off-by: Simon Horman <horms@verge.net.au>
[jesse: use kfree_skb instead of consume_skb on error path] Signed-off-by: Jesse Gross <jesse@nicira.com>
Jarno Rajahalme [Mon, 12 May 2014 06:38:44 +0000 (23:38 -0700)]
lib/classifier: Fix array splicing.
Array splicing was broken when multiple elements were being moved,
resulting in the priority order being mixed. This came up when the
highest priority rule from a subtable was removed and the subtable
needed to be moved down the priority list by more than one position.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Thomas Graf [Thu, 8 May 2014 18:45:25 +0000 (20:45 +0200)]
ovs-ctl: Don't decrease max open fds if already set higher
A user may set LimitNOFILE through systemd or other means to set
the maximum number of open file descriptors. Only modify the ulimit
if not already set to a higher value by the user.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Gospodarek <gospo@redhat.com>
Andy Zhou [Wed, 7 May 2014 05:31:00 +0000 (22:31 -0700)]
bond: raise minimal bond distribution per interface
Raise the minimal per interface packet distribution from 7 to 24.
With 256 packet distributing to 3 interfaces, the expected packets per
interface should be 256/3 = 85.3
Tested with 200 runs, the average number of packet sent to a single
interface is 85.9. close to the expected number, standard deviation
within the 200 run is 24.4. Tested with 2x standard deviation with
10K test runs, got around 0.1% failure rate. 2.5x standard deviation
passes 100K test runs without failure.
Using 2.5x for the unit test, 83.5 - 2.5 * 24.4, Round down to the
whole number of 24.
Signed-off-by: Andy Zhou <azhou@nicira.com> Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Joe Stringer [Sun, 4 May 2014 22:14:18 +0000 (10:14 +1200)]
tunnel: Fix bug where misconfiguration persists.
Previously, misconfiguring a tunnel port to use the exact same settings
would cause the corresponding netdev to never be destroyed. When
attempting to re-use the port as a different type, this would fail and
result in a discrepancy between reported port type and actual netdev in
use.
An example configuration that would previously give unexpected behaviour:
ovs-vsctl add-port br0 p0 -- set int p0 type=gre options:remote_ip=1.2.3.4
ovs-vsctl add-port br0 p1 -- set int p1 type=internal
ovs-vsctl set int p1 type=gre options:remote_ip=1.2.3.4
ovs-vsctl set int p1 type=internal
The final command would report in the ovs-vswitchd logs that it is
attempting to configure the port with the same gre settings as p0,
despite the command specifying the type as internal. Even after
deleting and re-adding the port, the message would reappear.
This patch fixes the bug by dereferencing the netdev in the failure
case of tnl_port_add__(), and ensures that the tnl_port structure is
freed in that case as well.
Ben Pfaff [Wed, 7 May 2014 20:14:45 +0000 (13:14 -0700)]
lacp: Really fix mutex initialization.
Commit 2a3fb0aa3c (lacp: Don't lock potentially uninitialized mutex in
lacp_status().) fixed one bug related to acquiring the file scope 'mutex'
without initializing it. However, there was at least one other, in
lacp_unixctl_show(). One could just fix that one problem, but that leaves
the possibility that I might have missed one or two more. This commit
fixes the problem for good, by adding a helper that initializes the mutex
and then acquires it.
It's not entirely clear why 'mutex' is a recursive mutex. I think that it
might be just because of the callback in lacp_run(). An alternate fix,
therefore, would be to eliminate the callback and therefore the need for
runtime initialization of the mutex.
Bug #1245659. Reported-by: Jeffrey Merrick <jmerrick@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
Added a Tunnel table to the VTEP schema that allows
per-tunnel BFD configuration and status to be specified.
Removed the BFD configuration/status from the
Physical_Locator table.
Signed-off-by: Ashwin Swaminathan <ashwinds@arista.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Bruce Davie <bdavie@vmware.com>
Alex Wang [Wed, 7 May 2014 06:46:39 +0000 (23:46 -0700)]
bfd: Update bfd status on bfd creation and deletion.
Commit 88bf179aa3 (bfd/cfm: Check status change before update
status to database.) used a boolean flag to trigger bfd status
update. However, the flag is not set on bfd creation and deletion.
To prevent any stale status in database, this commit makes bfd module
always set the flag on bfd creation and deletion.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
Alex Wang [Wed, 7 May 2014 06:21:36 +0000 (23:21 -0700)]
cfm: Update cfm status on cfm creation and deletion.
Commit 88bf179aa3 (bfd/cfm: Check status change before update
status to database.) used a boolean flag to trigger cfm status
update. However, the flag is not set on cfm creation and deletion.
And this causes stale status in database which may confuse users.
This commit fixes the issue by making cfm module trigger status
update on creation and deletion.
Ben Pfaff [Wed, 30 Apr 2014 17:45:16 +0000 (10:45 -0700)]
rconn: Preserve the name of an unreliable connection beyond disconnection.
An rconn has a human-readable name that typically designates both endpoints
of the connection. For a "reliable" rconn, that automatically reconnects,
the name remains constant regardless of whether the rconn is currently
connected. Until now, though, an "unreliable" rconn, that cannot
automatically reconnect, kept its name only until disconnection occurred.
This is OK for the uses currently in the OVS tree, which only use the name
of a rconn while it is connected, but an upcoming commit will add a final
log message following disconnection in some cases, and it makes the log
messages less useful if unreliable rconns just report "void" in that case.
This commit, therefore, modifies the rconn code so that unreliable rconns
preserve their names past disconnection, just like reliable ones.
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
Andy Zhou [Mon, 28 Apr 2014 22:46:30 +0000 (15:46 -0700)]
netdev-linux: favor netlink stats for physical ports
Currently physical ports stats are collected from kernel datapath.
However, those counter do not reflect actual wire packet counters
when GSO, TSO or GRO are enabled by the NIC. In the meantime, the
stats collected form routing stack does. While both stats are valid,
Reporting kernel netdev stats for packet counts and byte counts make
it easier to correlate those numbers with external measurements.
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Alex Wang [Thu, 1 May 2014 17:53:48 +0000 (10:53 -0700)]
netdev-vport: Checks tunnel status change when route-table is reset.
Commit 3e912ffcbb (netdev: Add 'change_seq' back to netdev.) added per-
netdev change number for indicating status change. Future commits used
this change number to optimize the netdev status update to database.
However, the work also introduced the bug in the following scenario:
- assume interface eth0 has address 1.2.3.4, eth1 has adddress 10.0.0.1.
- assume tunnel port p1 is set with remote_ip=10.0.0.5.
- after setup, 'ovs-vsctl list interface p1 status' should show the
'tunnel_egress_iface="eth1"'.
- now if the address of eth1 is change to 0 via 'ifconfig eth1 0'.
- expectedly, after change, 'ovs-vsctl list interface p1 status' should
show the 'tunnel_egress_iface="eth0"'
However, 'tunnel_egress_iface' will not be updated on current master.
This is in that, the 'netdev-vport' module corresponding to p1 does
not react to routing related changes.
To fix the bug, this commit adds a change sequence number in the route-
table module and makes netdev-vport check the sequence number for
tunnel status update.
Alex Wang [Thu, 3 Apr 2014 20:27:22 +0000 (13:27 -0700)]
bridge: Allow users to configure statistics update to OVSDB.
This commit adds a new configuration "stats-update-interval" in
"other_config" of Open_Vswitch table. So users can control the
statistics update frequency. A possible use case is that, users
can lower the update frequency to reduce the cpu consumption of
the ovs-vswitchd thread.
The configured value should always be greater than or equal to
5000 ms. And more frequent statistics update should be achieved
via OpenFlow.
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>
Alex Wang [Fri, 4 Apr 2014 01:31:13 +0000 (18:31 -0700)]
ofproto-dpif-monitor: Fix deadlock.
Commit 6b59b543 (ovs-thread: Use fair (but nonrecursive)
rwlocks on glibc.) changed the rwlocks to nonrecursive,
writer-biased lock. It also made the following deadlock
possible.
Assume BFD is used on both end of a link. Consider the
following events:
1. Handler at one end received the BFD control packet with
POLL flag set while holding the read lock of 'xlate_rwlock'.
Since a BFD control packet with FINAL flag set should be
sent back immediately, it calls the
ofproto_dpif_monitor_port_send_soon(), in which, it tries
to grab the 'monitor_mutex'.
2. The main thread needs to configure the ofproto-dpif-xlate
module. It tries to grab the write lock of 'xlate_rwlock'
and is blocked by event 1.
3. The monitor thread, after acquired the 'monitor_mutex',
wants to acquire the read lock of 'xlate_rwlock'.
Since the rwlock is now writer-biased, the attempt of acquiring
read lock in event 3 will be blocked by event 2. This will
subsequently cause the block of event 1, since monitor thread
is holding the 'monitor_mutex'. So the deadlock happens.
This commit resolves the above issue by removing the requirement of
acquiring 'monitor_mutex' in ofproto_dpif_monitor_port_send_soon().
Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
Alexandru Copot [Fri, 2 May 2014 06:54:27 +0000 (09:54 +0300)]
Add basic implementation for OpenFlow 1.4 bundles
This is only the communication part of the bundles functionality.
The actual message pre-validation and commits are not implemented.
We also enable OF1.4 for all the tests.
Signed-off-by: Alexandru Copot <alex.mihai.c@gmail.com> Cc: Daniel Baluta <dbaluta@ixiacom.com>
[blp@nicira.com made ofputil_decode_bundle_add() more obviously correct] Signed-off-by: Ben Pfaff <blp@nicira.com>
Simon Horman [Fri, 2 May 2014 08:41:32 +0000 (17:41 +0900)]
flow: Add mf_is_l3_or_higher()
This is in preparation for using the same helper as part of support
for using recirculation in conjunction series of actions including
with MPLS actions that are currently not able to be translated.
In that scenario this helper will be used to test if load, move and
set_field actions require recirculation to occur.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>