git.proxmox.com Git - ovs.git/log

signals: Don't assume that sys_siglist is an array.

Found by commit 878f1972909b3 (util: use gcc builtins to better check array
sizes).

Signed-off-by: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>

util: use gcc builtins to better check array sizes

GCC provides two useful builtin functions that can help
to improve array size checking during compilation.

This patch contains no functional changes, but it makes
it easier to detect mistakes.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif-upcall: Remove dead code from recv_upcalls().

Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif-upcall: reduce number of wakeup

If a queue length is long (ie. non-0), the consumer thread should
already be busy working on the queue. there's no need to wake it
up repeatedly.

Signed-off-by: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ovs-lib: Return the exit status of ovs-ctl in ovs_ctl().

commit 46528f78e5c(debian, rhel, xenserver: Ability to collect ovs-ctl logs)
made changes in the startup scripts such that the o/p of ovs-ctl is logged
into ovs-ctl.log. But it had an unintended consequence that the exit status
of ovs-ctl was no longer returned. We would always return success(the exit
status of tee).

With this commit, we return the exit status of ovs-ctl instead of tee.
Code referenced from: (line wrapped).
http://unix.stackexchange.com/questions/14270/\
get-exit-status-of-process-thats-piped-to-another/70675#70675)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Signed-off-by: Duffie Cooley <dcooley@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ovs-dpctl, ofproto/trace: Show and handle the in_port name in flows.

With this commit, whenever the verbosity is enabled with '-m'
option, the ovs-dpctl dump-flows command will display the flows with
in_port field showing the name instead of a port number.

Conversely, one can also use a name in the in_port field with del-flow,
add-flow and mod-flow commands of ovs-dpctl. One should also be able
to use the port name when supplying the datapath flow as an input
to ofproto/trace command.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

stream: Log a warning when the default OpenFlow or OVSDB port is used.

Both OpenFlow and OVSDB have new IANA-assigned port numbers. We still
default to the original values (6633 and 6632, respectively), but this
commit logs a warning. In the future, we will switch to the official
values (6653 and 6640, respectively).

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ofproto: Define official OpenFlow port number.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ovsdb: Define official port number.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

Don't differentiate between TCP and SSL ports for OpenFlow and OVSDB.

The OVS code has always made a distinction between the unencrypted (TCP)
and SSL port numbers for the OpenFlow and OVSDB protocols. The default
port numbers for both protocols has changed, and there continues to be
no distinction between the unencrypted and SSL versions. This
commit removes the distinction in port numbers. A future patch will
recognize the change in default port number.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

datapath: Fix vxlan gso with vlan.

To use ovs-gso-compatibility we need to record inner skb offset.
In case of vxlan it is done before vlan header is pushed which
gives wrong inner packet to ovs-gso.
Following patch reset skb offsets after inner skb is completely built.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>

ofproto-dpif: Compute the subfacet add/del rate using coverage counters.

So far, the subfacet rates (e.g. add rate, del rate) are computed by
exponential moving averaging function in ofproto-dpif.c. This commit
replaces that logic with coverage counters. And the rates can be
checked by running "ovs-appctl coverage/show" command.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

coverage: Reimplement the "ovs-appctl coverage/show" command.

This commit changes the "ovs-appctl coverage/show" command to show the
the averaged per-second rates for the last few seconds, the last minute
and the last hour, and the total counts of all of the coverage counters.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

rhel: fix the exit status of the openvswitch init script.

This is a fix for a request to make sure that the openvswitch status command
in rhel based distros gives a useful exit status. That was fixed in

commit 5e0c05bc058c78a11be6747f62e6ad88e5d06b70
debian: Fix exit status of openvswitch-switch init script "status" command

Signed-off-by: Duffie Cooley <dcooley@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif: Correct endian problem in recv_upcalls()

Use nl_attr_get_u32() instead of nl_attr_get_be32() to parse nla
so that the decoded value which is passed to mhash_add()
is host byte order as mhash_add() expects.

This resolves a minor regression introduced by
10e576406c7444ef ("ofproto-dpif: Move special upcall handling into
ofproto-dpif-upcall.").

I do not expect this change has any runtime implications.

Detected using sparse.

Cc: Ethan Jackson <ethan@nicira.com>
Cc: Ben Pfaff <blp@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif: Move special upcall handling into ofproto-dpif-upcall.

Both the IPFIX and SFLOW modules are thread safe, so there's no
particular reason to pass them up to the main thread. Eliminating
this step significantly simplifies the code.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

Suppress warnings about unused variables and functions.

These variables and functions are unused but we don't want to remove
their definitions.

Found by Clang.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

Remove unused variables and functions.

Found by Clang.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ovs-bugtool: Change log-days parameter based on file last_mod_time.

Previously the log-days parameter can only support rotated logs
based on their numbered filename extension. Thus it can not be
used for other types of log filenames. This patch changes it to
be based on file last modification time.

Issue: #19671

Signed-off-by: Shih-Hao Li <shihli@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>

odp: Only pass vlan_tci to commit_vlan_action()

This allows for future patches to pass different tci values to
commit_vlan_action() without passing an entire flow structure.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>

Remove mpls_depth field from flow

Rather than tracking the MPLS depth as a field in the
flow, which is an entirely poor place for it, just track
the delta to the MPLS depth during translation.

This logic was developed while implementing recirculation
and intended to be used to detect when recirculation should
occur. This variant of the patch uses the logic to determine
if processing of actions should stop due to an MPLS
action which cannot be translated (without recirculation).

A side-effect of this patch is that it resolves a bug
whereby ovs-vswitchd will abort due to to an assertion
on eth_type_mpls(ctx->xin->flow.dl_type) in compose_mpls_pop_action(()
if the actions of a flow include pop_mpls twice without
a push_mpls in between.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>

nicira-ext: Fix incorrect description of NXM_NX_IPV6_LABEL as non-maskable.

Commit 32455024044444 (OXM: Allow masking of IPv6 Flow Label) made the
IPv6 flow label field fully maskable but did not update the comment to say
so.

EXT-101.
CC: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ofproto: Recycle least recently used ofport.

If there is a lot of churn in creation and deletion of
interfaces, we may end up recycling the ofport value of a
recently deleted interface for a newly created interface.
This may result in an old stale openflow rule applying
on the newly created interface.

With this commit, when a new port is added, try and provide
it an ofport value that has not been used during the current
run. If such value does not exist, provide the least
recently used ofport value.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

FAQ: Explain why allowing only IP traffic breaks IP connectivity.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: pritesh <pritesh.kothari@cisco.com>

ovs-bugtool: Limit disk usage

When output to a file with "--unlimited" unset,
only allow 90% of the free disk space to be used.

Signed-off-by: Ben Pfaff <blp@nicira.com>

classifier: Avoid accumulating junk in cls_partition 'tags'.

It's easy to add two tags together, but it's hard to subtract them. The
new "tag_tracker" data structure provides a solution.

Signed-off-by: Ben Pfaff <blp@nicira.com>

classifier: Speed up lookup when metadata partitions the flow table.

We have a controller that puts many rules with different metadata values
into the flow table, where metadata is used (by "resubmit"s) to distinguish
stages in a pipeline. Thus, any given flow only needs to be hashed into
classifier "cls_table"s that contain a match for the flow's metadata value.
This commit optimizes the classifier lookup by (probabilistically) skipping
the "cls_table"s that can't possibly match.

(The "metadata" referred to here is the OpenFlow 1.1+ "metadata" field,
which is a 64-bit field similar in purpose to the "registers" defined by
Open vSwitch.)

Previous versions of this patch, with earlier versions of the controller in
question, improved flow setup performance by about 19%.

Bug #14282.
Signed-off-by: Ben Pfaff <blp@nicira.com>

tag: Reintroduce tag library.

It is needed for the classifier partitioning optimization. This commit
only reintroduces the parts that are actually needed.

Signed-off-by: Ben Pfaff <blp@nicira.com>

match: New function minimatch_matches_flow().

Signed-off-by: Ben Pfaff <blp@nicira.com>

ovs-ofctl: Fix a typo in documentation.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

datapath: Move segmentation compatibility code into a compatibility function

Move segmentation compatibility code out of netdev_send and into
rpl_dev_queue_xmit(), a compatibility function used in place
of dev_queue_xmit() as necessary.

As suggested by Jesse Gross.

Some minor though verbose implementation notes:

* This rpl_dev_queue_xmit() endeavours to return a valid error code or
 zero on success as per dev_queue_xmit(). The exception is that when
 dev_queue_xmit() is called in a loop only the status of the last call is
 taken into account, thus ignoring any errors returned by previous calls.
 This is derived from the previous calls to dev_queue_xmit() in a loop
 where netdev_send() ignores the return value of dev_queue_xmit()
 entirely.

* netdev_send() continues to ignore the value of dev_queue_xmit().
 So the discussion of the return value of rpl_dev_queue_xmit()
 above is has no bearing on run-time behaviour.

* The return value of netdev_send() may differ from the previous
 implementation in the case where segmentation is performed before
 calling the real dev_queue_xmit(). This is because previously in
 this case netdev_send() would return the combined length of the
 skbs resulting from segmentation. Whereas the current code
 always returns the length of the original skb.

Signed-off-by: Simon Horman <horms@verge.net.au>
[jesse: adjust error path in netdev_send() to match upstream]
Signed-off-by: Jesse Gross <jesse@nicira.com>

datapath: simplify VLAN segmentation

Push vlan tag onto packet before segmentation to simplify the code.
As suggested by Pravin Shelar and Jesse Gross.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>

tunnel: Make tnl_find() easier to understand.

Suggested-by: pritesh <pritesh.kothari@cisco.com>
Acked-by: pritesh <pritesh.kothari@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

OPENFLOW-1.1+: update for OF1.3 ONF Extentions and OF1.4

Signed-off-by: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>

for ovs-appctl bridge/dump-flows, don't show "priority" twice

before the change:

duration=2110s, priority=0, n_packets=3151646, n_bytes=3104180388, \
priority=0,actions=CONTROLLER:65535
table_id=254, duration=2136s, priority=0, n_packets=0, n_bytes=0, \
priority=0,reg0=0x3,actions=drop
table_id=254, duration=2136s, priority=0, n_packets=0, n_bytes=0, \
priority=0,reg0=0x1,actions=controller(reason=no_match)
table_id=254, duration=2136s, priority=0, n_packets=0, n_bytes=0, \
priority=0,reg0=0x2,actions=drop

after the change:

duration=2924s, n_packets=5316116, n_bytes=5260045454, \
priority=0,actions=CONTROLLER:65535
table_id=254, duration=2924s, n_packets=0, n_bytes=0, \
priority=0,reg0=0x3,actions=drop
table_id=254, duration=2924s, n_packets=0, n_bytes=0, \
priority=0,reg0=0x1,actions=controller(reason=no_match)
table_id=254, duration=2924s, n_packets=0, n_bytes=0, \
priority=0,reg0=0x2,actions=drop

(i wrapped long lines by hand)

Signed-off-by: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>

packets: Remove unused function eth_mpls_depth

eth_mpls_depth() has been unused as of 1ac7c9bdb2b6fdcb ("ofproto-dpif: Use
execute_actions to execute controller actions").

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>

datapath: Fix typos in comment.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: pritesh <pritesh.kothari@cisco.com>
Acked-by: Jesse Gross <jesse@nicira.com>

ofproto-dpif-upcall: Fix a memory leak.

The "key" member in struct flow_miss refers to memory held by the "struct
upcall", hence the upcalls should be freed only after the flow misses are
processed by the main thread.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ovs-dpctl: Update usage where datapath name is optional.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

odp-util: Parse the in_port as a name correctly.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ovs-dpctl: Parse the arguments correctly for del-flow.

Inside dpctl_del_flow() argv[0] is 'del-flow' and argv[1] can
be the flow in the absence of the optional datapath argument.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ovs-dpctl: Add a missing simap_destroy()

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ovs-dpctl: Remove stale comment.

The '-m' option is documented in the manpage.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif-upcall: Fix typos in comments.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: pritesh <pritesh.kothari@cisco.com>

hmap: Make bad hash functions easier to find.

The hmap code has for a long time incremented a counter when a hash bucket
grew to have many entries. This can let a developer know that some hash
function is performing poorly, but doesn't give any hint as to which one.
This commit improves the situation by adding rate-limited debug logging
that points out a particular line of code as the source of the poor hash
behavior. It should make issues easier to track down.

Bug #19926.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Lauded-by: Keith Amidon <keith@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ofproto: Fix memory leak in rule_actions_unref().

Found by valgrind.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

ofproto: Allow ofproto_delete_flow() to delete hidden rules.

Commit 97f63e57a8 (ofproto: Remove soon-to-be-invalid optimizations.)
changed ofproto_delete_flow() to use the general-purpose flow_mod
implementation. However, the general-purpose flow_mod never matches hidden
flows, which are exactly the flows that ofproto_delete_flow()'s callers
want to delete.

This commit fixes the problem by allowing flow_mods that specify a priority
that can only be for a hidden flow to delete a hidden flow. (This doesn't
allow OpenFlow clients to meddle with hidden flows because OpenFlow uses
only a 16-bit field to specify priority.)

I verified that, without this commit, if I change from one controller to
another with "ovs-vsctl set-controller", then the in-band rules for the
old controller remain. With this commit, the old rules are removed.

Reported-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Bug #19821.
Reported-by: Luca Giraudo <lgiraudo@vmware.com>
Bug #19888.
Reported-by: Ying Chen <yingchen@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

datapath: remove duplicated include from vport-gre.c

Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesse Gross <jesse@nicira.com>

datapath: remove duplicated include from vport-vxlan.c

Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesse Gross <jesse@nicira.com>

cfm: Don't enforce CFM_FAULT_INTERVAL.

While upgrading a deployment, it's possible that transient
configuration changes could cause the cfm interval on two ends of a
link to be different. If these two configured values are close to
each other, this condition could have no impact on traffic. Therefore
it's better to let this slide than force a tunnel down guaranteeing an
impact

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>

cfm: Prevent interval fault when demand mode is enabled on one end.

This commit prevents cfm from raising 'interval' fault when demand
mode is only enabled on one end of link.

Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto-dpif-upcall: Forward packets in order of arrival.

Until now, the code in ofproto-dpif-upcall (and the code that preceded it
in ofproto-dpif) obtained a batch of incoming packets, inserted them into
a hash table based on hashes of their flows, processed them, and then
forwarded them in hash order. Usually this maintains order within a single
network connection, but because OVS's notion of a flow is so fine-grained,
it can reorder packets within (e.g.) a TCP connection if two packets
handled in a single batch have (e.g.) different ECN values.

This commit fixes the problem by making ofproto-dpif-upcall always forward
packets in the same order they were received.

This is far from the minimal change necessary to avoid reordering packets.
I think that the code is easier to understand afterward.

Reported-by: Dmitry Fleytman <dfleytma@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

datapath: add linux/utils.c to .gitignore

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

utilities: add ovs-dpctl-top to .gitignore

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

datapath: Remove net_device_ops compatibility code.

The symbol HAVE_NET_DEVICE_OPS was changed in 3.1 but code the that
it was protecting dates back to before 2.6.32. Therefore, we can
just assume that net_device_ops exists in all cases and drop the
compat code.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch/types.h: New macros OVS_BE16_MAX, OVS_BE32_MAX, OVS_BE64_MAX.

These seem slightly nicer than e.g. htons(UINT16_MAX).

Signed-off-by: Ben Pfaff <blp@nicira.com>

NEWS: Move backported item from post-v2.0.0 to v2.0.0.

I backported millisecond timestamps to branch-2.0 upon request from Paul
Ingram, so we should mention it as part of that version.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>

utilities: a top like tool for ovs-dpctl dump-flows.

This python script summarizes ovs-dpctl dump-flows content by aggregating
the number of packets, total bytes and occurrence of the following fields:
 - Datapath in_port
 - Ethernet type
 - Source and destination MAC addresses
 - IP protocol
 - Source and destination IPv4 addresses
 - Source and destination IPv6 addresses
 - UDP and TCP destination port
 - Tunnel source and destination addresses

Testing included confirming both mega-flows and non-megaflows are
properly parsed. Bit masks are applied in the case of mega-flows
prior to aggregation. Test --script parameter which runs in
non-interactive mode. Tested syntax against python 2.4.3, 2.6 and 2.7.
Confirmed script passes pep8 and pylint run as:

pylint --disable=I0011 --include-id=y --reports=n

This tool has been added to these distribution:
 - add ovs-dpctl-top to debian distribution
 - add ovs-dpctl-top to rpm distribution.
 - add ovs-dpctl-top to XenServer RPM.

Signed-off-by: Mark Hamilton <mhamilton@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>

odp-util: Remove trailing whitespace.

Signed-off-by: Ethan Jackson <ethan@nicira.com>

netdev-vport: Fix indentation.

Signed-off-by: Ethan Jackson <ethan@nicira.com>

ofproto-dpif: Avoid unnecessarily counting packets.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif: Remove write-only member 'key_fitness' from struct subfacet.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif-upcall: Remove redundant 'packets' list from struct flow_miss.

Until now, struct flow_miss contained a list of packets and a list of
upcalls. Each packet in the list of packets can be obtained from the
corresponding upcall in the list of upcalls via upcall->dpif_upcall.packet,
so this commit deletes the list of packets and replaces each reference to
a packet by that expression.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif: Use shash_find_and_delete() to simplify close_dpif_backer().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

ofproto-dpif: Fix use-after-free error deleting last bridge.

valgrind reported:

 Invalid read of size 4
 at 0x806ADC1: odp_port_to_ofport (hmap.h:267)
 by 0x8077C05: xlate_receive (ofproto-dpif-xlate.c:523)
 by 0x8073994: handle_miss_upcalls (ofproto-dpif-upcall.c:642)
 by 0x80741AA: udpif_miss_handler (ofproto-dpif-upcall.c:412)
 by 0x56FCC38: start_thread (pthread_create.c:304)
 by 0x735378D: clone (clone.S:130)
 Address 0x786c084 is 4 bytes inside a block of size 16 free'd
 at 0x4D8350C: free (vg_replace_malloc.c:427)
 by 0x8065EDA: close_dpif_backer (ofproto-dpif.c:1094)

The problem is that close_dpif_backer() destroys odp_to_ofport_map and the
associated mutex before it calls udpif_destroy() to stop the forwarding
threads. This gives the forwarding threads a window in which to try to
use odp_to_ofport_map.

This commit moves the udpif_destroy() call much earlier, solving the
problem. (The call to udpif_destroy() must follow the call to
drop_key_clear() because drop_key_clear() uses the udpif.)

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>

vlog: Fix formatting of milliseconds in Python log messages.

Commit 2b31d8e713de7 (vlog: Report timestamps in millisecond resolution in
log messages.) introduced milliseconds to log messages by default, but the
Python version did not ensure that milliseconds were always formatted with
3 digits, so 3.001 was formatted as "3.1" and 3.012 as "3.12", and so on.
This commit fixes the problem.

CC: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ovs-ofctl: Add meter support.

Adds commands add-meter, mod-meter, del-meter, del-meters, dump-meter,
dump-meters, meter-stats, and meter-features.

Syntax is as follows:

add-meter meter=<n> (kbps|pktps) [burst] [stats] bands= type=(drop|dscp_remark) rate=<n> [burst_size=<n>] [prec_level=<n>] <more bands>

mod-meter meter=<n> (kbps|pktps) [burst] [stats] bands= type=(drop|dscp_remark) rate=<n> [burst_size=<n>] [prec_level=<n>] <more bands>

del-meter meter=(<n>|all)

del-meters 

dump-meter meter=(<n>|all)

dump-meters 

meter-stats [meter=(<n>|all)]

meter-features 

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

rule_ofpacts: keep datapath meter_id.

This allows datapaths to operate without referring to the ofproto-level
meter configuration for mater ID mapping, which reduces needs for locking.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto meters: Keep provider meter_id over mods.

Changed the ofproto meter API to require the provider keep the provider
meter ID unchanged when modifying a meter. This makes the mapping from
an OpenFlow meter ID to a datapath meter ID consistent over the lifetime
of the meter.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto: Actually return the error code from meter add.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

INSTALL.RHEL: Explain how to work around old Autoconf; add more prereqs.

The new text is adapted from INSTALL.XenServer.

Mark Hamilton supplied the list of additional build prerequisites.

Reported-by: Mark Hamilton <mhamilton@nicira.com>
CC: Gurucharan Shetty <shettyg@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

tests: Make ovsdb-server add-db/remove-db test faster and more reliable.

Until now, the "ovsdb-server/add-db and remove-db with --monitor" test
killed ovsdb-server with SIGSEGV twice. Each time, the "--monitor" option
caused the supervisor process to restart the child, but the second time it
incurred a 10-second delay intended to prevent the daemon from wasting CPU
time by restarting itself and dying again very quickly in a loop. This
made the test take over 10 seconds to execute. It also made it
occasionally fail because the OVS_WAIT_UNTIL check waits at most
approximately 10 seconds before it decides that the condition that it is
testing for will never occur.

This commit fixes the problem by breaking the test into two tests, each of
which kills ovsdb-server with SIGSEGV only once.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ovsdb: write commit timestamps to millisecond resolution.

This is expected to make system debugging easier.

This raises two compatibility issues:
1. When a new ovsdb-tool reads an old database, it will multiply by 1000 any
 timestamp it reads which is less than 1<<31. Since this date corresponds to
 Jan 16 1970 this is unlikely to cause a problem.
2. When an old ovsdb-tool reads a new database, it will interpret the
 millisecond timestamps as seconds and report dates in the far future; the
 time of this commit is reported as the year 45672 (each second since the
 epoch is interpreted as 16 minutes).

Signed-off-by: Paul Ingram <pingram@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ovsdb: Use DB load time, not on-disk commit times, for compaction.

The ovsdb-server compaction timing logic is written assuming monotonic
time at milliscond resolution but it calculated the next compaction time
based on the oldest commit in the database. This was a problem because
commit timestamps are written in wall-clock time to second resolution.

This commit calculates the next compaction time based on the time when
the database was first loaded or the last compaction was done, both in
monotonic time at millisecond resolution.

Signed-off-by: Paul Ingram <pingram@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

FAQ: Fix version number for 2.0.

Presumably we will have minor version releases in the 2.x series
as well.

Signed-off-by: Jesse Gross <jesse@nicira.com>

timeval: Replace rwlock by mutex.

It's only held briefly now and in general a mutex tends to be preferred for
that case.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

timeval: Restore ability to warp time forward when time is not stopped.

Commit 31ef9f5178dee18 (timeval: Remove CACHE_TIME scheme.) inadvertently
removed the ability to warp time forward, for use in tests, except when
time is stopped. This fixes the problem.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

timeval: Add Clang thread-safety annotations, fix unimportant violation.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

vlog: Report timestamps in millisecond resolution in log messages.

To make debugging easier.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Paul Ingram <pingram@nicira.com>

datapath: remove HAVE_MAC_RAW

This was causing it to fail against latest RT kernels, with following errors:

In file included from /home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/compat/include/linux/if_vlan.h:6:0,
 from /home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/actions.c:29:
/home/arm/work/kernel/linaro/lng/lng.git/include/linux/if_vlan.h: In function vlan_insert_tag:
/home/arm/work/kernel/linaro/lng/lng.git/include/linux/if_vlan.h:197:5: error: struct sk_buff has no member named mac
In file included from /home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/../flow.h:34:0,
 from /home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/../datapath.h:31,
 from /home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/actions.c:36:
/home/arm/work/kernel/linaro/lng/lng.git/include/net/inet_ecn.h: In function INET_ECN_set_ce:
/home/arm/work/kernel/linaro/lng/lng.git/include/net/inet_ecn.h:137:10: error: struct sk_buff has no member named nh
/home/arm/work/kernel/linaro/lng/lng.git/include/net/inet_ecn.h:142:10: error: struct sk_buff has no member named nh
/home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/actions.c: In function __pop_vlan_tci:
/home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/actions.c:72:5: error: struct sk_buff has no member named mac
make[7]: *** [/home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux/actions.o] Error 1
make[6]: *** [_module_/home/arm/work/kernel/linaro/lng/openvswitch/datapath/linux] Error 2

Not sure why it was added earlier but my guess is, for earlier RT kernels struct
sk_buff had following variables mac.raw, nh.raw, h.raw instead of mac_header,
network_header, transport_header. And so the hack to rename them in OVS code.
But that's not the case now. RT kernel have mac_header, network_header and
transport_header as parameter and so we don't need this macro at all.

Lets get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>

AUTHORS: Add Torbjorn Tornkvist.

Signed-off-by: Ben Pfaff <blp@nicira.com>

ofp-util: Announce OpenFlow 1.3 table features only in OpenFlow 1.3.

The translation into OpenFlow 1.2 didn't trim off the OpenFlow 1.3 specific
bits. This fixes the problem.

It would probably be wise to introduce an ofputil_table_stats structure.
Using ofp12_table_stats is somewhat confusing.

Reported-by: Torbjorn Tornkvist <kruskakli@gmail.com>
Tested-by: Torbjorn Tornkvist <kruskakli@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>

ofproto-dpif: Move "learn" actions into individual threads.

Before OVS introduced threading, any given ``learn'' action took effect in
the flow table before the next incoming flow was set up. This meant that
if a packet came in, used ``learn'' to set up a flow to handle replies, and
then a reply came in, the reply would always hit the flow set up by the
``learn'' action. Until now, with the threading implementation, though,
the effects of ``learn'' actions happen asynchronously via a queue, which
means that the reply can arrive before the flow to handle it is set up.
This introduced an unacceptable regression in important use cases.

This commit fixes the problem by switching back to executing learn actions
before forwarding the packet that triggered it.

I imagine that there is considerable opportunity for optimization here.

Bug #19147.
Bug #19244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Remove redundant cls parameter from a few functions.

Previously this parameter was useful for Clang locking annotations
but it isn't actually a locking requirement anymore, so remove
the parameter.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Add global locking around flow table changes.

This makes 'ofproto_mutex' protect the flow table well enough that threads
other than the main one can realistically modify flows.

I need to look at the interface between ofproto and connmgr: I think that
there might need to be some locking there too.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Refactor eviction cases to use common code.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: New helper any_pending_ops().

This function is trivial now but in an upcoming commit it will need to
become slightly more complicated, which makes writing it as a function
worthwhile.

Until then, this commit simplifies the logic, which was redundant since
the 'deletions' hmap always points into the 'pending' list anyway.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Mark immutable members of struct rule 'const'.

One difficult part of make flow_mods thread-safe is sorting out which
members of each structure can be modified under which locks and,
especially, documenting this in a way that makes it hard for programmers to
get it wrong later. The compiler provides some tools for us, however, and
'const' is one of the nicer ones since it is standardized rather than part
of a compiler extension.

This commit makes use of 'const' to mark the immutable members of struct
rule, which is definitely the most confusing structure regarding thread
safety simply because it has so many members that use different forms of
synchronization. It also adds a bunch of CONST_CASTs to allow these
members to be initialized and destroyed where necessary.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Make some functions for rules private to ofproto.c.

These aren't used outside this file.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Remove ->timeout_mutex from struct rule (just use ->mutex).

I think that ->timeout_mutex and ->mutex were separate because the latter
(which was actually a rwlock named ->rwlock until recently) was held for
long periods of time, which meant that having a separate ->timeout_mutex
reduced lock contention. But ->mutex is now only held briefly, so it seems
reasonable to just use it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Replace rwlock in struct rule by a mutex.

A rwlock is suitable when one expects long hold times so there is a need
for some parallelism for readers. But when the lock is held briefly, it
makes more sense to use a mutex for two reasons. First, a rwlock is more
complex than a mutex so one would expect it to be more expensive to
acquire. Second, a rwlock is less fair than a mutex: as long as there are
any readers this blocks out writers.

At least, that's the behavior I intuitively expect, and a few looks around
the web suggest that I'm not the only one.

Previously, struct rule's rwlock was held for long periods, so using a
rwlock made sense. Now it is held only briefly, so this commit replaces it
by a mutex.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Protect index by cookie with ofproto_mutex.

The cookie index is only used for flow_mods, so it makes sense to use this
global mutex.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Drop 'expirable_mutex' in favor of new global 'ofproto_mutex'.

This is the first step toward making a global lock that protects everything
needed for updating the flow table. This commit starts out by merging one
lock into the new one, and later commits will continue that process.

The mutex is initially a recursive one, because I wasn't sure that there
were no nested acquisitions at this stage, but a later commit will change
it to a regular error-checking mutex once it's settled down a bit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Move rule_execute() out of ofopgroup_complete().

One goal we're moving toward is to be able to execute "learn" actions
in each thread that handles packets that arrive on an interface just before
we forward those packets. The planned strategy there is to have a global
lock that protects everything required to modify the flow table. Generally
this works out well, but rule_execute() is a trouble spot. That's because
it's called during a flow table modification (when that global lock is
held) which means that a "learn" action triggered by the executing the
packet would try to recursively modify the flow table and reacquire the
global lock.

I can see two ways out of this issue. One would be to make the global lock
a recursive one, or otherwise deal with handling recursive flow_mods. The
other is to just queue up flow_mods due to rule_execute(), which itself is
a corner case (it only happens when a flow_mod sent via OpenFlow includes
a buffer_id). (I guess there could be other more radical solutions, like
just dropping packets that contain "learn" actions if they occur in this
situation.)

This commit implements the second solution because it seems less likely to
be wrong in a way that crashes the switch.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

classifier: Allow CLS_CURSOR_FOR_EACH to use a const-qualified iterator.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

guarded-list: New data structure for thread-safe queue.

We already had queues that were suitable for replacement by this data
structure, and I intend to add another one later on.

flow_miss_batch_ofproto_destroyed() did not work well with the guarded-list
structure (it required either adding a lot more functions or breaking the
abstraction) so I changed the caller to just use udpif_revalidate().

Checking reval_seq at the end of handle_miss_upcalls() also didn't work
well with the abstraction, so I decided that since this was a corner case
anyway it would be acceptable to just drop those in flow_miss_batch_next().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Add a ref_count to "struct rule" to protect it from being freed.

Taking a read-lock on the 'rwlock' member of struct rule prevents members
of the rule from changing. This is a short-term use of the 'rwlock': one
takes the lock, reads some members, and releases the lock.

Taking a read-lock on the 'rwlock' also prevents the rule from being freed.
This is often a relatively long-term need. For example, until now flow
translation has held the rwlock in xlate_table_action() across the entire
recursive translation, which can call into a great deal of different code
across multiple files.

This commit switches to using a reference count for this second purpose
of struct rule's rwlock. This means that all the code that previously
held a read-lock on the rwlock across deep stacks of functions can now
simply keep a reference.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Break actions out of rule into new rule_actions structure.

This permits code to ensure long-term access to a rule's actions
without holding a long-term lock on the rule's rwlock.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Remove soon-to-be-invalid optimizations.

Until now, ofproto_add_flow() and ofproto_delete_flow() assumed that the
flow table could not change between its flow table check and its later
modification to the flow table. This assumption will soon be untrue, so
remove it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>

ofproto: Move function find_meter() into ofpacts as ofpacts_get_meter().

ofproto is too big anyway so we might as well move out code that can
reasonably live elsewhere.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>