]> git.proxmox.com Git - mirror_corosync.git/log
mirror_corosync.git
13 years agobuild: disable RDMA support in RPMs by default
Florian Haas [Tue, 5 Jul 2011 11:44:57 +0000 (13:44 +0200)]
build: disable RDMA support in RPMs by default

Rather than curiously disable RDMA support by default in configure and
enable it by default in RPM builds, streamline the default
configuration to always turn RDMA support off. It can be enabled in
RPM builds with "--with rdma".

Signed-off-by: Florian Haas <florian.haas@linbit.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agobuild: set RDMA related _LIBS and _CFLAGS only if building with RDMA support
Florian Haas [Tue, 5 Jul 2011 11:22:50 +0000 (13:22 +0200)]
build: set RDMA related _LIBS and _CFLAGS only if building with RDMA support

Having to force {ibverbs,rdmacm}_{LIBS,CFLAGS} looks positively odd;
so this may warrant further review. However, they are definitely not
needed if building without RDMA support.

Signed-off-by: Florian Haas <florian.haas@linbit.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agobuild: make RDMA support an RPM build conditional
Florian Haas [Tue, 5 Jul 2011 09:54:52 +0000 (11:54 +0200)]
build: make RDMA support an RPM build conditional

Enable RDMA in RPM builds by default to maintain the previous behavior
(which always included --enable-rdma in the %configure invocation).

Signed-off-by: Florian Haas <florian.haas@linbit.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agobuild: force LC_ALL=C correctly for dates
Florian Haas [Tue, 5 Jul 2011 11:10:05 +0000 (13:10 +0200)]
build: force LC_ALL=C correctly for dates

Failure to force "C" dates will have RPM et al. complain about invalid
dates and timestamps.

Signed-off-by: Florian Haas <florian.haas@linbit.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED < 1
Tim Beale [Wed, 6 Jul 2011 13:38:17 +0000 (06:38 -0700)]
Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED < 1

For the case where _POSIX_THREAD_PROCESS_SHARED < 1, the code doesn't compile
for corosync v1.3.1. And when it does compile, it crashes on our system - our
version of uClibc seems to always expect a 4th arg. The man pages suggests
the 4th arg is optional, but does say: 'For greater portability it is best to
always call semctl() with four arguments', which is what this patch does.
Also removed semop as it's an unused variable.

Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agogetpwnam_r()/getgrnam_r() returns ERANGE for some systems
Tim Beale [Wed, 6 Jul 2011 13:31:45 +0000 (06:31 -0700)]
getpwnam_r()/getgrnam_r() returns ERANGE for some systems

On our system the expected buffer length is 256. This means calls to
getpwnam_r()/getgrnam_r() return ERANGE error and corosync fails to startup.
These 2 functions return ERANGE when insufficient buffer space is supplied.
Judging by the man page for getpwnam_r, the correct way to determine the
buffersize on any given system is to use sysconf().

Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoRRP: redundant ring automatic recovery
Jiaju Zhang [Tue, 5 Jul 2011 15:54:38 +0000 (23:54 +0800)]
RRP: redundant ring automatic recovery

This patch automatically recovers redundant ring failures.

Please note that this patch introduced rrp_autorecovery_check_timeout
in totem config hence breaks internal ABI. The internal ABI users
of totem.h need to rebuild their binaries.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Signed-off-by: Steven Dake <sdake@redhat.com>
Tested-by: Jan Friesse <jfriesse@redhat.com>
Tested-by: Florian Haas <florian.haas@linbit.com>
Tested-by: Jiaju Zhang <jjzhang@suse.de>
13 years agoCorrect mailing list address in corosync_overview manpage
Tim Serong [Mon, 23 May 2011 04:19:23 +0000 (14:19 +1000)]
Correct mailing list address in corosync_overview manpage

Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agofix typos in cpg_mcast_joined.3 and cpg_zcb_mcast_joined.3
Masatake YAMATO [Tue, 28 Jun 2011 09:06:23 +0000 (18:06 +0900)]
fix typos in cpg_mcast_joined.3 and cpg_zcb_mcast_joined.3

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
13 years agoAdd coverity target to corosync makefile.am
Steven Dake [Fri, 20 May 2011 02:53:00 +0000 (19:53 -0700)]
Add coverity target to corosync makefile.am

Allow a make coverity target for those developers with coverity tools
available to them.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agocoroipcc: Test _SC_PAGESIZE result
Jan Friesse [Mon, 30 May 2011 11:15:02 +0000 (13:15 +0200)]
coroipcc: Test _SC_PAGESIZE result

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoRemove spinlocks
Jan Friesse [Tue, 21 Jun 2011 09:57:08 +0000 (11:57 +0200)]
Remove spinlocks

Spinlocks are now removed, because even spinlock can improve
speed is some special cases, in most cases it makes corosync CPU usage
much more intensive and less responsive then if only mutexes are used.

What we were doing is:
pthread_mutex_lock
pthread_spin_lock
pthread_spin_unlock
pthread_mutex_unlock

what is not safe.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agovotequorum: free newly allocated node if nodeid==0
Jan Friesse [Mon, 30 May 2011 14:00:45 +0000 (16:00 +0200)]
votequorum: free newly allocated node if nodeid==0

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix usage of strerror_r()/perror()
Jerome Flesch [Tue, 28 Jun 2011 07:56:58 +0000 (09:56 +0200)]
Fix usage of strerror_r()/perror()

Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agosched_params log message incorrect
Steven Dake [Thu, 23 Jun 2011 05:46:56 +0000 (22:46 -0700)]
sched_params log message incorrect

The sched_params parameter was set before being printed.

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Reviewed-by: <sdake@redhat.com>
13 years agoconfigure.ac: Align --enable-* options description
Jan Friesse [Tue, 21 Jun 2011 10:02:56 +0000 (12:02 +0200)]
configure.ac: Align --enable-* options description

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoconfigure.ac: change edefault to default
Jan Friesse [Tue, 21 Jun 2011 10:51:55 +0000 (12:51 +0200)]
configure.ac: change edefault to default

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoCTS: Test for confdb dispatch deadlock
Jan Friesse [Wed, 15 Jun 2011 14:49:53 +0000 (16:49 +0200)]
CTS: Test for confdb dispatch deadlock

Test is disabled by default because it depends on SMP and about 2GB RAM.
It's also testing race, so test is unreliable.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoconfdb: Resolve dispatch deadlock
Jan Friesse [Wed, 15 Jun 2011 13:54:23 +0000 (15:54 +0200)]
confdb: Resolve dispatch deadlock

Following situation could happen:
- one thread is waiting for finish write operation (line 853), objdb is
  locked
- flush (done in objdb_notify_dispatch) is called in main thread, but
  this call will never appear because main thread is waiting for objdb
  lock.

In this situation deadlock appears.

Commit solves this by:
- setting pipe to non-blocking mode
- pipe is used only as trigger for coropoll
- dispatch messages are stored in list
- main thread is processing messages from list

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoobjdb: save copy of handles in object_find_create
Jan Friesse [Thu, 9 Jun 2011 13:46:31 +0000 (15:46 +0200)]
objdb: save copy of handles in object_find_create

Following situation could happen:
- process 1 thru confdb creates find handle
- calls find iteration once
- different process 2 deletes object pointed by process 1 iterator
- process 1 calls iteration again ->
  object_find_instance->find_child_list is invalid pointer

-> segfault

Now object_find_create creates array of matching object handlers and
object_find_next uses that array together with check for name. This
prevents situation where between steps 2 and 3 new object is created
with different name but sadly with same handle.

Also good to note that this patch is more or less quick hack rather
then proper solution. Real proper solution is to not use pointers
and rather use handles everywhere. This is big TODO.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoRRP: Fix ring initialization issue for UDPU mode
Jiaju Zhang [Wed, 8 Jun 2011 23:59:26 +0000 (07:59 +0800)]
RRP: Fix ring initialization issue for UDPU mode

Redundant ring has some problem in the UDP unicast mode. The problem
is the second ring has not been successfully initialized, that is, the
second time iface_changes happens, the member list for that interface
has not been added, which results in that ring cannot transmit normal
message. So the second ring cannot take over the work if the first
ring is down. This patch fixes this issue.

comments from review:
More work is needed probably in totemnet where totemnet maintains the
the of node list and an iterator for them, and totemudpu_member_add adds
state information to a context for the iteration.

In any regard, that is somewhat difficult to test, so I'll merge this
patch for now - keep in mind interface changes on the bindnetaddr will
cause problems with udpu after this patch has been commmitted.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocoroipcc: check recvmsg result in socket_recv
Jan Friesse [Thu, 9 Jun 2011 13:42:54 +0000 (15:42 +0200)]
coroipcc: check recvmsg result in socket_recv

According specification recvmsg can return 0, which means that
connection is closed. We had this check, but limited only for systems
other then Linux. recvmsg can return 0 even on Linux, so check is now
applied on all systems.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoconfdb: Properly check result of object_find_create
Jan Friesse [Thu, 9 Jun 2011 13:42:33 +0000 (15:42 +0200)]
confdb: Properly check result of object_find_create

in confdb_object_iter result of object_find_create is now properly
checked. object_find_create can return -1 if object doesn't exists.
Without this check, incorrect handle (memory garbage) was directly
passed to object_find_next.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agocrypto: rng_make_prng prevent buf overflow
Jan Friesse [Mon, 30 May 2011 14:55:45 +0000 (16:55 +0200)]
crypto: rng_make_prng prevent buf overflow

with bits set to 1023, buf of 256 bytes was filled by rng_get_bytes
up to 257 bytes. Buf is now 258 bytes so it's no longer problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agomainconfig: Check retval of logsys_format_set
Jan Friesse [Mon, 30 May 2011 11:02:36 +0000 (13:02 +0200)]
mainconfig: Check retval of logsys_format_set

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agotestcpgzc: fgets buffer to really allocated size
Jan Friesse [Mon, 30 May 2011 13:51:45 +0000 (15:51 +0200)]
testcpgzc: fgets buffer to really allocated size

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocpg: do_proc_join change list_slice to list_add
Jan Friesse [Mon, 30 May 2011 14:41:37 +0000 (16:41 +0200)]
cpg: do_proc_join change list_slice to list_add

In this concrete case result is equivalent but makes coverity happy.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agototemudp: memset of proper size
Jan Friesse [Mon, 30 May 2011 13:53:39 +0000 (15:53 +0200)]
totemudp: memset of proper size

In totemudp_mcast_thread_state_constructor memset to
sizeof(struct totemudp_mcast_thread_state) instead of size of
pointer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocoroipcs: init buf in coroipcs_handler_dispatch
Jan Friesse [Mon, 30 May 2011 13:50:04 +0000 (15:50 +0200)]
coroipcs: init buf in coroipcs_handler_dispatch

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocoroparse: don't leak dirent
Jan Friesse [Mon, 30 May 2011 13:43:14 +0000 (15:43 +0200)]
coroparse: don't leak dirent

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agologsys: _logsys_wthread_create never returns != 0
Jan Friesse [Mon, 30 May 2011 11:08:23 +0000 (13:08 +0200)]
logsys: _logsys_wthread_create never returns != 0

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agonotifyd: Check retval of corosync_cfg_initialize
Jan Friesse [Mon, 30 May 2011 11:06:03 +0000 (13:06 +0200)]
notifyd: Check retval of corosync_cfg_initialize

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agototemconfig: discard check of objdb_get_string ret
Jan Friesse [Mon, 30 May 2011 10:37:20 +0000 (12:37 +0200)]
totemconfig: discard check of objdb_get_string ret

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocoroipcc: proper path size in coroipcc_zcb_alloc
Jan Friesse [Mon, 30 May 2011 09:59:27 +0000 (11:59 +0200)]
coroipcc: proper path size in coroipcc_zcb_alloc

memory_map function internally limits maximum path size to
PATH_MAX but coroipcc_zcb_alloc passed smaller buffer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agolibquorum: memset/memcpy proper size of callbacks
Jan Friesse [Mon, 30 May 2011 09:54:42 +0000 (11:54 +0200)]
libquorum: memset/memcpy proper size of callbacks

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoiazc: Reduce number of mem alloc and memcpy
Jan Friesse [Tue, 17 May 2011 09:20:37 +0000 (11:20 +0200)]
iazc: Reduce number of mem alloc and memcpy

X86 processors are able to handle unaligned memory access. Improve
performance by using that feature on i386 and x86_64 compatible
processors, and use old aligning code on different processors.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agologsys: When corosync is compiled with --enable-small-memory-footprint, also reduce...
Jerome Flesch [Fri, 27 May 2011 11:45:27 +0000 (13:45 +0200)]
logsys: When corosync is compiled with --enable-small-memory-footprint, also reduce the size of the logsys SHM

Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocoroipcc_dispatch_get(): Fix --enable-small-memory-footprint support
Jerome Flesch [Fri, 27 May 2011 11:42:42 +0000 (13:42 +0200)]
coroipcc_dispatch_get(): Fix --enable-small-memory-footprint support

Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
13 years agocoroipcs_handler_dispatch(): Fix conn_info->service security value: -1 is not a good...
Jerome Flesch [Fri, 27 May 2011 11:40:36 +0000 (13:40 +0200)]
coroipcs_handler_dispatch(): Fix conn_info->service security value: -1 is not a good security value since it's equal to SOCKET_SERVICE_INIT

Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
13 years agocoroipcc: Fix unhandled BSD EOF in coroipcc_dispatch_get()
Jerome Flesch [Fri, 27 May 2011 11:35:02 +0000 (13:35 +0200)]
coroipcc: Fix unhandled BSD EOF in coroipcc_dispatch_get()

Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoCorosync: Fix build when done with --enable-fatal-warnings
Jerome Flesch [Fri, 27 May 2011 11:29:12 +0000 (13:29 +0200)]
Corosync: Fix build when done with --enable-fatal-warnings

Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Jan Friesse<jfriesse@redhat.com>
13 years agologsys.c: Use snprintf() instead of sprintf().
Russell Bryant [Sun, 8 May 2011 07:40:34 +0000 (02:40 -0500)]
logsys.c: Use snprintf() instead of sprintf().

Change a couple of string functions to use the the output length
limiting counterpart.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
13 years agocorosync-objctl: Option to display binary data
Jan Friesse [Wed, 11 May 2011 14:58:23 +0000 (16:58 +0200)]
corosync-objctl: Option to display binary data

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocpg: fix sync master selection when one node paused.
Angus Salkeld [Wed, 4 May 2011 23:29:37 +0000 (09:29 +1000)]
cpg: fix sync master selection when one node paused.

If one node is paused it can miss a config change and
thus report a larger old_members than expected.

The solution is to use the left_nodes field.

Master selection used to be "choose node with":
1) largest previous membership
2) (then as a tie-breaker) node with smallest nodeid

New selection:
1) largest (previous #nodes - #nodes know to have left)
2) (then as a tie-breaker) node with smallest nodeid

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: fix some tests that didn't handle been called more than one
Angus Salkeld [Wed, 4 May 2011 23:11:18 +0000 (09:11 +1000)]
CTS: fix some tests that didn't handle been called more than one

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: sort the configuration - prevent duplicates in the config file
Angus Salkeld [Wed, 4 May 2011 23:09:38 +0000 (09:09 +1000)]
CTS: sort the configuration - prevent duplicates in the config file

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: fix syntax error in log message
Angus Salkeld [Wed, 4 May 2011 23:10:20 +0000 (09:10 +1000)]
CTS: fix syntax error in log message

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: bump up log messages of failed RPC
Angus Salkeld [Wed, 4 May 2011 23:08:11 +0000 (09:08 +1000)]
CTS: bump up log messages of failed RPC

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: don't force all-once (breaks random tests)
Angus Salkeld [Wed, 4 May 2011 23:07:04 +0000 (09:07 +1000)]
CTS: don't force all-once (breaks random tests)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoautobuild: improve messages
Angus Salkeld [Wed, 4 May 2011 23:06:28 +0000 (09:06 +1000)]
autobuild: improve messages

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: add -l to keygen (normal keygen struggles to run on VMs)
Angus Salkeld [Wed, 4 May 2011 04:41:18 +0000 (14:41 +1000)]
CTS: add -l to keygen (normal keygen struggles to run on VMs)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: send with correct number of iovecs
Angus Salkeld [Mon, 18 Apr 2011 02:46:53 +0000 (12:46 +1000)]
CTS: send with correct number of iovecs

Else payload won't be sent

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoCTS: timer should not be on the stack
Angus Salkeld [Mon, 18 Apr 2011 02:45:50 +0000 (12:45 +1000)]
CTS: timer should not be on the stack

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
13 years agototemsrp: Enhance mcast failure detection
Jan Friesse [Wed, 4 May 2011 13:00:31 +0000 (15:00 +0200)]
totemsrp: Enhance mcast failure detection

memb_state_gather_enter increase stats.continuous_gather only if
previous state was gather also. This should happen only if multicast is
not working properly (local firewall in most cases) and not if many
nodes joins at one time.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agocoroipcs: Deny connect to service without initfn
Jan Friesse [Tue, 29 Mar 2011 13:51:42 +0000 (15:51 +0200)]
coroipcs: Deny connect to service without initfn

If library connect to service with no init function, coroipcs will try
to dereference NULL pointer. Now we correctly return error code
CS_ERR_NOT_EXIST.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoAdd ipc_refcnt to message_handler_req_{exec, lib}_cfg_ringreenable()
Tim Serong [Fri, 15 Apr 2011 00:40:11 +0000 (10:40 +1000)]
Add ipc_refcnt to message_handler_req_{exec, lib}_cfg_ringreenable()

Without refcounting the conn pointer here, corosync will segfault
if one kills a running instance of "corosync-cfgtool -r" (rhbz#695191)

Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoAlign ipc on 8 byte boundaries
Steven Dake [Mon, 3 Jan 2011 23:40:55 +0000 (16:40 -0700)]
Align ipc on 8 byte boundaries

Align all ipc messages on 8 byte boundaries.  This alignment will remove bus
errors on systems that can't access non-byte aligned data and should improve
performance.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoFix problem where unaligned totemip address access would result in bus error on non...
Steven Dake [Mon, 3 Jan 2011 23:40:54 +0000 (16:40 -0700)]
Fix problem where unaligned totemip address access would result in bus error on non-unaligned-safe architectures.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoClean up ENDIAN ifdef tests
Greg Walton [Thu, 6 Jan 2011 16:15:24 +0000 (11:15 -0500)]
Clean up ENDIAN ifdef tests

Signed-off-by: Greg Walton <corosync@gwalton.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix tyop in RRP faulty error messages
Tim Serong [Wed, 6 Apr 2011 11:30:46 +0000 (21:30 +1000)]
Fix tyop in RRP faulty error messages

Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Russell Bryant <russell@russellbryant.net>
13 years agoIPC: place calls to stats functions outside of mutexes
Angus Salkeld [Tue, 12 Apr 2011 22:15:59 +0000 (08:15 +1000)]
IPC: place calls to stats functions outside of mutexes

This is to prevent nasty deadlocks between IPC and objdb.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoProvide better checking of the message type
Zane Bitter [Sun, 10 Apr 2011 13:04:17 +0000 (09:04 -0400)]
Provide better checking of the message type

A negative value for the message type (on systems where char is signed)
would cause a crash. This is highly probable if the cluster is, for example,
misconfigured to have encryption enabled on some nodes but not others.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix uninitialised memory errors found by valgrind
Zane Bitter [Fri, 8 Apr 2011 03:48:49 +0000 (23:48 -0400)]
Fix uninitialised memory errors found by valgrind

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix shutdown when a confdb client is still connected
Angus Salkeld [Tue, 29 Mar 2011 02:25:04 +0000 (13:25 +1100)]
Fix shutdown when a confdb client is still connected

If you are connected to corosync and registered for
object notifications then corosync is asked to shutdown
the IPC server will get stuck. This is because the pipe
is closed and the refcount is increased. This leaves ipcs
with a connection that it can't destroy.

Solution:
1) if a write to the pipe fails (pipe closed) decrement the refcounter.
2) fix the object_track_stop() - it was not working as the functions
   did not match up. (this caused the late callbacks).
3) in ipcs call exit_fn() then stats_destroy_connection() so that
   the service engine can have time to call object_track_stop()
   before the object gets destroyed.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoSTATS: add the service name to the connection name.
Angus Salkeld [Mon, 28 Mar 2011 22:41:04 +0000 (09:41 +1100)]
STATS: add the service name to the connection name.

This helps to quickly identify what service the application
is connected to.

The object will now look like:
runtime.connections.corosync-objctl:CONFDB:19654:13.service_id=11
runtime.connections.corosync-objctl:CONFDB:19654:13.client_pid=19654
etc...

This also makes it clearer to receivers of the dbus/snmp events
what is going on.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoNOTIFYD: prevent duplicate quorate events.
Angus Salkeld [Sat, 26 Mar 2011 11:09:29 +0000 (22:09 +1100)]
NOTIFYD: prevent duplicate quorate events.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoNOTIFYD: fix retrieving the application's parent name.
Angus Salkeld [Sat, 26 Mar 2011 11:08:55 +0000 (22:08 +1100)]
NOTIFYD: fix retrieving the application's parent name.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocfgtool: print list of IP with space between items
Jan Friesse [Tue, 22 Mar 2011 16:33:03 +0000 (17:33 +0100)]
cfgtool: print list of IP with space between items

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocpgtool: print list of IP with space between items
Jan Friesse [Tue, 22 Mar 2011 16:32:45 +0000 (17:32 +0100)]
cpgtool: print list of IP with space between items

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocfg_get_node_addrs: Return correct addresses
Jan Friesse [Tue, 22 Mar 2011 16:20:12 +0000 (17:20 +0100)]
cfg_get_node_addrs: Return correct addresses

Zero element array behavior is very different from normal array or
pointer. This behavior is root of problem in not returning correctly
filled array of addresses. This appeared only in rrp mode, where more
then one address is returned.

All memcpy's are now correctly converted to copy pointer to char.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agototemsrp: free messages originated in recovery rather then rely on messages_free
Steven Dake [Thu, 24 Mar 2011 15:53:40 +0000 (08:53 -0700)]
totemsrp: free messages originated in recovery rather then rely on messages_free

Relying on messages_free may seem like it should work, but it leads to a
situation where every node has released the messages, yet some nodes think
messages are missing.  The output then looks like "Retransmit: #" in
repitition.  This patch frees those messages immediately during the transition
to the OPERATIONAL state and sets the internal variables totemsrp depends
upon to the proper values.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
13 years agototemsrp: Only restore old ring id information one time
Steven Dake [Thu, 24 Mar 2011 15:46:24 +0000 (08:46 -0700)]
totemsrp: Only restore old ring id information one time

The current code stores the current ring information every time a commit
token is generated.  This causes the old ring id used for comparison purposes
to increase if a token is lost in commit or recovery, resulting in failure of
totem.  This patch changes the behavior to only store the old ring id one
time when the commit token is received, and then further commit token ring
id saves are not done until OPERATIONAL is reached.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
13 years agototemsrp: Remove recv_flush code
Steven Dake [Thu, 24 Mar 2011 15:30:53 +0000 (08:30 -0700)]
totemsrp: Remove recv_flush code

The recv_flush code is no longer necessary because of the miss_count_count
addition.  It can in some cases lead to register corruption because of
interactions with -fstack-protector, the recursive nature of how this code
works, and interactions with the optimizer in some versions of gcc.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
13 years agoconfdb: send notifications from the main thread not IPC thread
Angus Salkeld [Wed, 23 Mar 2011 20:54:42 +0000 (07:54 +1100)]
confdb: send notifications from the main thread not IPC thread

corosync-notifyd has exposed an issue with confdb notifications.

The normal state of affairs is:
IPC thread > lock > objdb > lock

objdb notification whilst really useful turn things around:
<middle of big call chain>
objdb > lock > confdb > ipc > lock

This reverse ordering of locks causes a horrible dead lock.

I see this patch as a work around until corosync-2.0
when most of the threads and locking disappear.

This patch adds a pipe to confdb service. When we get a
objdb notification a struct gets written to the pipe.
The poll loop then runs the dispatch in the main thread.
In the dispatch we call the real ipc_dispatch_send().

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoResolve abort during simulatenous stopping of atleast 4 nodes
Steven Dake [Sat, 19 Mar 2011 01:47:10 +0000 (18:47 -0700)]
Resolve abort during simulatenous stopping of atleast 4 nodes

consider 5 nodes.

node 3,4 stopped (by random stopping) node 1,2,5 form new configuration
and during recovery node 1 and node 2 are stopped (via service service
corosync stop).  This causes 5 never to finish recovery within the timeout
period, triggering a token loss in recovery.  Bug #623176 resolved an assert
which happens because the full ring id was being restored.  The resolution
to Bug #623176 was to not restore the full ring id, and instead operate
(according to specifications) the new ring id.  Unfortunately this exposes
a problem whereby the restarting of nodes 1-4 generate the same ring id.
This ring id gets to the recovery failed node 5 which is now in gather,
and triggers a condition not accounted for in the original totem specification.

It appears later work from Dr. Agarwal's PHD dissertation considers this
scenario.  That solution entails rejecting the regular token in the above
condition.  Since the ring id is also used to make decisions for commit token
acceptance, we must also take care to reject the regular token in all cases
after transitioning from OPERATIONAL.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agonotifyd: dispatch only one message at a time.
Angus Salkeld [Mon, 21 Mar 2011 02:37:18 +0000 (13:37 +1100)]
notifyd: dispatch only one message at a time.

This is avoid getting stuck in the dispatch processing
messages when the user is trying to shutdown the service.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix some "set but not used" warnings [-Wunused-but-set-variable]
Angus Salkeld [Mon, 14 Mar 2011 23:38:44 +0000 (10:38 +1100)]
Fix some "set but not used" warnings [-Wunused-but-set-variable]

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoRemove the ttl option from udpu and rely on the kernel ttl setting.
Angus Salkeld [Mon, 14 Mar 2011 22:50:56 +0000 (09:50 +1100)]
Remove the ttl option from udpu and rely on the kernel ttl setting.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
13 years agoFix the ttl defaults and range
Angus Salkeld [Mon, 14 Mar 2011 22:44:05 +0000 (09:44 +1100)]
Fix the ttl defaults and range

1) both IPv4 and IPv6 mcast should default to ttl=1
2) the range should be 0..255
   0 is valid meaning localhost only (cluster of one)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
13 years agoAdd Doxyfile to .gitignore
Russell Bryant [Sat, 12 Mar 2011 12:37:53 +0000 (06:37 -0600)]
Add Doxyfile to .gitignore

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agodocs: auto-generate the version
Angus Salkeld [Sat, 12 Mar 2011 08:39:04 +0000 (19:39 +1100)]
docs: auto-generate the version

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoConvert existing documentation to doxygen format.
Russell Bryant [Sat, 12 Mar 2011 02:50:41 +0000 (20:50 -0600)]
Convert existing documentation to doxygen format.

This patch modifies most of the existing comments in header files to be
in a format that doxygen can interpret.  This provides another
significant improvement to the web/pdf/etc generated documentation
without having to add new content.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoAllocate packet buffers in the transport drivers
Zane Bitter [Fri, 11 Mar 2011 03:30:35 +0000 (22:30 -0500)]
Allocate packet buffers in the transport drivers

This change paves the way for eliminating a copy within the Infiniband
driver in the future by transferring responsibility for allocating and
freeing message buffers to the transport driver layer.

Tested under valgrind on a single-node cluster.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix minor errors in man page documentation for corosync.conf
Zane Bitter [Thu, 10 Mar 2011 04:55:06 +0000 (23:55 -0500)]
Fix minor errors in man page documentation for corosync.conf

* Correct 'See Also' reference to corosync.conf(5) in corosync(8) man page
* Update path to default config (now /etc/corosync/corosync.conf)

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoFix abort when token is lost in RECOVERY state
Steven Dake [Fri, 4 Mar 2011 19:55:54 +0000 (12:55 -0700)]
Fix abort when token is lost in RECOVERY state

A commit token should be rejected when a token is lost in the recovery
state.  This occurs naturally because the ring id increases by 4 for
every new ring.  Prior to this patch, if the token was lost, the old
ring id information was restored, causing a commit token to be accepted
when it should be rejected.  This erronously accepted commit token would
lead to an assertion which is fixed by this patch.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoAdd content for the doxygen main page.
Russell Bryant [Mon, 7 Mar 2011 14:42:01 +0000 (08:42 -0600)]
Add content for the doxygen main page.

This creates some content on the main page of the documentation
generated by doxygen.  The main page includes the license and a link
to the project web site.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
eviewed-by: Steven Dake <sdake@redhat.com>

13 years agoResolve a couple of doxygen warnings.
Russell Bryant [Mon, 7 Mar 2011 14:39:58 +0000 (08:39 -0600)]
Resolve a couple of doxygen warnings.

This resolves a couple of doxygen warnings.  First, the group needed a
name.  Second, all of the functions in the file were added to the group
but doxygen complained about the lack of an end to the grouping.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoUpdate doxygen configuration file.
Russell Bryant [Mon, 7 Mar 2011 14:38:53 +0000 (08:38 -0600)]
Update doxygen configuration file.

The included doxygen configuration file was a bit stale.  It included
some options that were obsolete and caused doxygen to generate some
warnings when running it.  Most of the changes here were simply done by
running "doxygen -u" to automatically update the file.  It added its
documentation for the options and removed the obsolete options.

This also includes one configuration change, which is to set EXTRACT_ALL
to yes.  This instructs doxygen to generate documentation pages for all
files, public functions, and public data structures even if they are not
currently documented using doxygen syntax.  Doxygen is capable of
generating some useful documentation on its own, such as dependency
graphs.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoMinor build system updates for doxygen.
Russell Bryant [Mon, 7 Mar 2011 14:36:53 +0000 (08:36 -0600)]
Minor build system updates for doxygen.

The configure script has been updated to check for the doxygen and dot
applications (from doxygen and graphviz).  The results from these checks
are now used in the Makefile to ensure that the tools are installed when
you run "make doxygen".  If they are not, it will generate a helpful
error message.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoEnsure that strings are null terminated after strncpy().
Russell Bryant [Mon, 7 Mar 2011 14:30:03 +0000 (08:30 -0600)]
Ensure that strings are null terminated after strncpy().

From the strcpy(3) man page, the following warning is given:
  The strncpy() function is similar, except that at most n bytes of src
  are  copied.  Warning: If there is no null byte among the first n bytes
  of src, the string placed in dest will not be null-terminated.

The current corosync code base does not take this warning into account
when using strncpy, potentially resulting in non-null terminated strings.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoAdd -l option to corosync-keygen.
Russell Bryant [Sat, 5 Mar 2011 16:02:25 +0000 (10:02 -0600)]
Add -l option to corosync-keygen.

This option (-l or --less-secure) causes corosync-keygen to read from
/dev/urandom instead of /dev/random to ensure that no input is required
from the user.  It may be useful when this command is used from a
script.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoDon't assert when ring id file is less then 8 bytes
Steven Dake [Tue, 22 Feb 2011 19:48:32 +0000 (12:48 -0700)]
Don't assert when ring id file is less then 8 bytes

If the ring id file for the processor is less then 8 bytes, totemsrp would
assert.  Our speculation is that this condition happens during a fencing
operation or local filesystem corruption.

With this patch, Corosync will create fresh ring id file data when the
incorrect number of bytes are read from the ring id.

Amend to use sizeof the strerror string length and PATH_MAX for the path length.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agosnmp: Allow buildling of corosync on already existing older install of corosync
Steven Dake [Thu, 24 Feb 2011 21:12:10 +0000 (14:12 -0700)]
snmp: Allow buildling of corosync on already existing older install of corosync

When building corosync against older libraries already installed on the system,
the corosync-notifyd application uses the wrong Makefile.am commands.  This
results in the SNMPLIBS (which includes -L/usr/lib64) coming before the proper
LDADD flags.  The result is an inability to compile on an already existing
installation.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
13 years agoobjdb: destroy all handles in _clear_object
Jan Friesse [Wed, 23 Feb 2011 14:15:49 +0000 (15:15 +0100)]
objdb: destroy all handles in _clear_object

Patch replaces free for object_instance with handle_destroy to remove
leaks in handles (and also memory leak).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agoIterate all items in object_reload_notification
Jan Friesse [Tue, 22 Feb 2011 11:19:48 +0000 (12:19 +0100)]
Iterate all items in object_reload_notification

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocorosync-fplay: use uint32_t and remove bit-shift
Jan Friesse [Tue, 22 Feb 2011 09:31:59 +0000 (10:31 +0100)]
corosync-fplay: use uint32_t and remove bit-shift

The flight recorder records all data in 32 bit words. Use uint32_t type
rather then unsigned int. Also remove bit-shift with multiply by sizeof
uint32_t.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocorosync-fplay: Use size_t length mod in printf
Jan Friesse [Tue, 22 Feb 2011 09:30:07 +0000 (10:30 +0100)]
corosync-fplay: Use size_t length mod in printf

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agocorosync-fplay: handle too large rec_size
Jan Friesse [Mon, 21 Feb 2011 17:24:53 +0000 (18:24 +0100)]
corosync-fplay: handle too large rec_size

Corrupted files may contain items with rec_size larger then g_record
buffer and/or flt_data_size.

Also g_record array size is now defined as constant.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agologsys: Properly lock flt data before dump
Jan Friesse [Mon, 21 Feb 2011 12:23:46 +0000 (13:23 +0100)]
logsys: Properly lock flt data before dump

Data needs to be locked, otherwise resulting fdata file may be
incorrect.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
13 years agologsys: Don't leak fd on successful fdata dump
Jan Friesse [Mon, 21 Feb 2011 12:14:21 +0000 (13:14 +0100)]
logsys: Don't leak fd on successful fdata dump

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>