Angus Salkeld [Fri, 5 Aug 2011 02:18:43 +0000 (12:18 +1000)]
libqb: change ipc -> qb_ipc
IPC: return 0/-ENOBUFS from message handler
IPC: use the new rate_limit API to improve perf.
CPG: add send_async API & hook up flow control
IPC: Fix flow control getting stuck.
IPC: Port the remaining libs to use libqb IPC
IPC: remove libqb flowcontrol API
TEST: put cpg_dispatch() in it's own thread
IPC: cleanup ipc_glue.c name everything cs_ipcs_*()
IPC: add back statistics
IPC: remove coroipcc_ symbols from lib*.versions
IPC: init each se's IPC as it is loaded.
IPC: use the new connection_closed() event to free the context.
IPC: re-add zero copy functionality back
IPC: remove cpg_mcast_joined_async() and make it the default
-> now cpg_mcast_joined() == cpg_mcast_joined_async()
libqb: expose a libqb error converter
libqb: add missing error conversions
libqb: remove repeat try loop in lib/cpg.c
CPG: fix zero copy mcast
CPG: use newer return codes
Add ENOTCONN to qb_to_cs_error()
libqb: fix error conversion from errno to cs_error_t in confdb
libqb: change errno_to_cs to qb_to_cs_error
libqb: add a cs_strerror() to get a more meaningful message
libqb: fix some confusing error conversions.
libqb: set the timeout on recv's to -1 (wait forever)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse pointed out that bindnetaddr should be set to a host
address (as opposed to a network address) on hosts where multiple
NICs live on the same subnet. Add a comment to that effect to
the example configuration file.
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Change suggested mcastaddr to one in the 239.255.0.0/16
pseudo-subnet. Multicast addresses outside 239.x.x.x may be IANA
registered and can clash with other services present on the
network. Suggest an address defined as part of the multicast IPv4
Local Scope in RFC 2365.
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Dan Frincu <dan.frincu@1and1.ro> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Change the example configuration file so "bindnetaddr" has a value
that more obviously looks like a network address. So as not to have
people think they need to set an existing IP address here (and hence,
have non-identical corosync.conf files between nodes).
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Dan Frincu <dan.frincu@1and1.ro> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Tim Beale [Tue, 19 Jul 2011 15:58:21 +0000 (08:58 -0700)]
Add some more stats for debugging
+ overload - number of times client is told to try again
+ invalid_request - message contained invalid paramter, e.g. invalid size
+ msg_queue_avail - messages currently available at the Totem layer
+ msg-queue_reserved - messages currently reserved at the Totem layer
Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse [Fri, 15 Jul 2011 15:10:41 +0000 (17:10 +0200)]
totemconfig: Change default FAIL_TO_RECV_CONST
Previous default (50) was too low for most modern switch hardware. This
may trigger abort because the aru doesn't increase for 50 token
rotations combined with a defect in how failed to recv conditions are
handled. By increasing this tunable, the condition should no longer
trigger the errant code.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Steven Dake [Thu, 7 Jul 2011 22:29:10 +0000 (15:29 -0700)]
Fix problem where corosync will segfault if there are gaps in recovery queue
Fixes a problem where there are gaps in the recovery queue. Example my_aru = 5,
but there are messages at 7,8. 8 = my_high_seq_received which results
in data slots taken up in new message queue. What should really happen
is these last messages should be delivered after a transitional
configuration to maintain SAFE agreement. We don't have support for
SAFE atm, so it is probably safe just to throw these messages away. Without
this change, the new message queue on a new configuraton change is out of sync.
Signed-off-by: Steven Dake <sdake@redhat.com> Tested-by: Tim Beale <tlbeale@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Rather than curiously disable RDMA support by default in configure and
enable it by default in RPM builds, streamline the default
configuration to always turn RDMA support off. It can be enabled in
RPM builds with "--with rdma".
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Steven Dake <sdake@redhat.com>
build: set RDMA related _LIBS and _CFLAGS only if building with RDMA support
Having to force {ibverbs,rdmacm}_{LIBS,CFLAGS} looks positively odd;
so this may warrant further review. However, they are definitely not
needed if building without RDMA support.
Signed-off-by: Florian Haas <florian.haas@linbit.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Tim Beale [Wed, 6 Jul 2011 13:38:17 +0000 (06:38 -0700)]
Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED < 1
For the case where _POSIX_THREAD_PROCESS_SHARED < 1, the code doesn't compile
for corosync v1.3.1. And when it does compile, it crashes on our system - our
version of uClibc seems to always expect a 4th arg. The man pages suggests
the 4th arg is optional, but does say: 'For greater portability it is best to
always call semctl() with four arguments', which is what this patch does.
Also removed semop as it's an unused variable.
Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>
Tim Beale [Wed, 6 Jul 2011 13:31:45 +0000 (06:31 -0700)]
getpwnam_r()/getgrnam_r() returns ERANGE for some systems
On our system the expected buffer length is 256. This means calls to
getpwnam_r()/getgrnam_r() return ERANGE error and corosync fails to startup.
These 2 functions return ERANGE when insufficient buffer space is supplied.
Judging by the man page for getpwnam_r, the correct way to determine the
buffersize on any given system is to use sysconf().
Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>
This patch automatically recovers redundant ring failures.
Please note that this patch introduced rrp_autorecovery_check_timeout
in totem config hence breaks internal ABI. The internal ABI users
of totem.h need to rebuild their binaries.
Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Signed-off-by: Steven Dake <sdake@redhat.com> Tested-by: Jan Friesse <jfriesse@redhat.com> Tested-by: Florian Haas <florian.haas@linbit.com> Tested-by: Jiaju Zhang <jjzhang@suse.de>
Jan Friesse [Tue, 21 Jun 2011 09:57:08 +0000 (11:57 +0200)]
Remove spinlocks
Spinlocks are now removed, because even spinlock can improve
speed is some special cases, in most cases it makes corosync CPU usage
much more intensive and less responsive then if only mutexes are used.
What we were doing is:
pthread_mutex_lock
pthread_spin_lock
pthread_spin_unlock
pthread_mutex_unlock
what is not safe.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse [Wed, 15 Jun 2011 13:54:23 +0000 (15:54 +0200)]
confdb: Resolve dispatch deadlock
Following situation could happen:
- one thread is waiting for finish write operation (line 853), objdb is
locked
- flush (done in objdb_notify_dispatch) is called in main thread, but
this call will never appear because main thread is waiting for objdb
lock.
In this situation deadlock appears.
Commit solves this by:
- setting pipe to non-blocking mode
- pipe is used only as trigger for coropoll
- dispatch messages are stored in list
- main thread is processing messages from list
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse [Thu, 9 Jun 2011 13:46:31 +0000 (15:46 +0200)]
objdb: save copy of handles in object_find_create
Following situation could happen:
- process 1 thru confdb creates find handle
- calls find iteration once
- different process 2 deletes object pointed by process 1 iterator
- process 1 calls iteration again ->
object_find_instance->find_child_list is invalid pointer
-> segfault
Now object_find_create creates array of matching object handlers and
object_find_next uses that array together with check for name. This
prevents situation where between steps 2 and 3 new object is created
with different name but sadly with same handle.
Also good to note that this patch is more or less quick hack rather
then proper solution. Real proper solution is to not use pointers
and rather use handles everywhere. This is big TODO.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Jiaju Zhang [Wed, 8 Jun 2011 23:59:26 +0000 (07:59 +0800)]
RRP: Fix ring initialization issue for UDPU mode
Redundant ring has some problem in the UDP unicast mode. The problem
is the second ring has not been successfully initialized, that is, the
second time iface_changes happens, the member list for that interface
has not been added, which results in that ring cannot transmit normal
message. So the second ring cannot take over the work if the first
ring is down. This patch fixes this issue.
comments from review:
More work is needed probably in totemnet where totemnet maintains the
the of node list and an iterator for them, and totemudpu_member_add adds
state information to a context for the iteration.
In any regard, that is somewhat difficult to test, so I'll merge this
patch for now - keep in mind interface changes on the bindnetaddr will
cause problems with udpu after this patch has been commmitted.
Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse [Thu, 9 Jun 2011 13:42:54 +0000 (15:42 +0200)]
coroipcc: check recvmsg result in socket_recv
According specification recvmsg can return 0, which means that
connection is closed. We had this check, but limited only for systems
other then Linux. recvmsg can return 0 even on Linux, so check is now
applied on all systems.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
Jan Friesse [Thu, 9 Jun 2011 13:42:33 +0000 (15:42 +0200)]
confdb: Properly check result of object_find_create
in confdb_object_iter result of object_find_create is now properly
checked. object_find_create can return -1 if object doesn't exists.
Without this check, incorrect handle (memory garbage) was directly
passed to object_find_next.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
Jan Friesse [Tue, 17 May 2011 09:20:37 +0000 (11:20 +0200)]
iazc: Reduce number of mem alloc and memcpy
X86 processors are able to handle unaligned memory access. Improve
performance by using that feature on i386 and x86_64 compatible
processors, and use old aligning code on different processors.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Jan Friesse [Wed, 4 May 2011 13:00:31 +0000 (15:00 +0200)]
totemsrp: Enhance mcast failure detection
memb_state_gather_enter increase stats.continuous_gather only if
previous state was gather also. This should happen only if multicast is
not working properly (local firewall in most cases) and not if many
nodes joins at one time.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
Jan Friesse [Tue, 29 Mar 2011 13:51:42 +0000 (15:51 +0200)]
coroipcs: Deny connect to service without initfn
If library connect to service with no init function, coroipcs will try
to dereference NULL pointer. Now we correctly return error code
CS_ERR_NOT_EXIST.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>
Steven Dake [Mon, 3 Jan 2011 23:40:55 +0000 (16:40 -0700)]
Align ipc on 8 byte boundaries
Align all ipc messages on 8 byte boundaries. This alignment will remove bus
errors on systems that can't access non-byte aligned data and should improve
performance.
Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>