Steven Dake [Mon, 19 Sep 2011 23:38:34 +0000 (16:38 -0700)]
Deliver all messages from my_high_seq_recieved to the last gap
This patch passes two test cases:
-------
Test #1
-------
Two node cluster - run cpgbench on each node
modify totemsrp with following defines:
Two test cases:
-------
Test #2
-------
5 node cluster
start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message. If message blocks
on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.
If it doesn't the test case has failed
Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Steven Dake [Wed, 31 Aug 2011 05:25:21 +0000 (22:25 -0700)]
Ignore memb_join messages during flush operations
a memb_join operation that occurs during flushing can result in an
entry into the GATHER state from the RECOVERY state. This results in the
regular sort queue being used instead of the recovery sort queue, resulting
in segfault.
Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Mon, 29 Aug 2011 13:09:52 +0000 (15:09 +0200)]
rrp: Higher threshold in passive mode for mcast
There were too much false positives with passive mode rrp when high
number of messages were received.
Patch adds new configurable variable rrp_problem_count_mcast_threshold
which is by default 10 times rrp_problem_count_threshold and this is
used as threshold for multicast packets in passive mode. Variable is
unused in active mode.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed by: Steven Dake <sdake@redhat.com>
Jan Friesse [Mon, 29 Aug 2011 08:44:05 +0000 (10:44 +0200)]
rrp: Handle endless loop if all ifaces are faulty
If all interfaces were faulty, passive_mcast_flush_send and related
functions ended in endless loop. This is now handled and if there is no
live interface, message is dropped.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed by: Steven Dake <sdake@redhat.com>
Steven Dake [Mon, 22 Aug 2011 22:23:51 +0000 (15:23 -0700)]
Add totempg_threaded_mode_enable() api
This API allows totem to operate as a multithreaded library. Performance is
better without threads but some library users may only have multithreaded
systems. In the corosync case where we have removed threads, this reduces
cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls
that occur during typical operation. Since the latest corosync is nearly
thread free, there is no need for mutex operations.
Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
Tim Beale [Thu, 18 Aug 2011 12:57:10 +0000 (14:57 +0200)]
A CPG client can sometimes lockup if the local node is in the downlist
In a 10-node cluster where all nodes are booting up and starting corosync
at the same time, sometimes during this process corosync detects a node as
leaving and rejoining the cluster.
Occasionally the downlist that gets picked contains the local node. When the
local node sends leave events for the downlist (including itself), it sets
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
means it no longer sends CPG events to the CPG client.
Tim Beale [Wed, 17 Aug 2011 01:52:25 +0000 (11:52 +1000)]
Add code comment mapping for message handler defines
As a corosync-newbie it can be hard to bridge the gap between where a
particular message is sent and where the receive handler processes it,
and vice versa.
Angus Salkeld [Fri, 5 Aug 2011 02:18:43 +0000 (12:18 +1000)]
libqb: change ipc -> qb_ipc
IPC: return 0/-ENOBUFS from message handler
IPC: use the new rate_limit API to improve perf.
CPG: add send_async API & hook up flow control
IPC: Fix flow control getting stuck.
IPC: Port the remaining libs to use libqb IPC
IPC: remove libqb flowcontrol API
TEST: put cpg_dispatch() in it's own thread
IPC: cleanup ipc_glue.c name everything cs_ipcs_*()
IPC: add back statistics
IPC: remove coroipcc_ symbols from lib*.versions
IPC: init each se's IPC as it is loaded.
IPC: use the new connection_closed() event to free the context.
IPC: re-add zero copy functionality back
IPC: remove cpg_mcast_joined_async() and make it the default
-> now cpg_mcast_joined() == cpg_mcast_joined_async()
libqb: expose a libqb error converter
libqb: add missing error conversions
libqb: remove repeat try loop in lib/cpg.c
CPG: fix zero copy mcast
CPG: use newer return codes
Add ENOTCONN to qb_to_cs_error()
libqb: fix error conversion from errno to cs_error_t in confdb
libqb: change errno_to_cs to qb_to_cs_error
libqb: add a cs_strerror() to get a more meaningful message
libqb: fix some confusing error conversions.
libqb: set the timeout on recv's to -1 (wait forever)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse pointed out that bindnetaddr should be set to a host
address (as opposed to a network address) on hosts where multiple
NICs live on the same subnet. Add a comment to that effect to
the example configuration file.
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Change suggested mcastaddr to one in the 239.255.0.0/16
pseudo-subnet. Multicast addresses outside 239.x.x.x may be IANA
registered and can clash with other services present on the
network. Suggest an address defined as part of the multicast IPv4
Local Scope in RFC 2365.
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Dan Frincu <dan.frincu@1and1.ro> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Change the example configuration file so "bindnetaddr" has a value
that more obviously looks like a network address. So as not to have
people think they need to set an existing IP address here (and hence,
have non-identical corosync.conf files between nodes).
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Dan Frincu <dan.frincu@1and1.ro> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Tim Beale [Tue, 19 Jul 2011 15:58:21 +0000 (08:58 -0700)]
Add some more stats for debugging
+ overload - number of times client is told to try again
+ invalid_request - message contained invalid paramter, e.g. invalid size
+ msg_queue_avail - messages currently available at the Totem layer
+ msg-queue_reserved - messages currently reserved at the Totem layer
Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse [Fri, 15 Jul 2011 15:10:41 +0000 (17:10 +0200)]
totemconfig: Change default FAIL_TO_RECV_CONST
Previous default (50) was too low for most modern switch hardware. This
may trigger abort because the aru doesn't increase for 50 token
rotations combined with a defect in how failed to recv conditions are
handled. By increasing this tunable, the condition should no longer
trigger the errant code.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Steven Dake [Thu, 7 Jul 2011 22:29:10 +0000 (15:29 -0700)]
Fix problem where corosync will segfault if there are gaps in recovery queue
Fixes a problem where there are gaps in the recovery queue. Example my_aru = 5,
but there are messages at 7,8. 8 = my_high_seq_received which results
in data slots taken up in new message queue. What should really happen
is these last messages should be delivered after a transitional
configuration to maintain SAFE agreement. We don't have support for
SAFE atm, so it is probably safe just to throw these messages away. Without
this change, the new message queue on a new configuraton change is out of sync.
Signed-off-by: Steven Dake <sdake@redhat.com> Tested-by: Tim Beale <tlbeale@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Rather than curiously disable RDMA support by default in configure and
enable it by default in RPM builds, streamline the default
configuration to always turn RDMA support off. It can be enabled in
RPM builds with "--with rdma".
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Steven Dake <sdake@redhat.com>
build: set RDMA related _LIBS and _CFLAGS only if building with RDMA support
Having to force {ibverbs,rdmacm}_{LIBS,CFLAGS} looks positively odd;
so this may warrant further review. However, they are definitely not
needed if building without RDMA support.
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Steven Dake <sdake@redhat.com>