]> git.proxmox.com Git - mirror_corosync.git/log
mirror_corosync.git
8 years agoschedwrk: Cleanup and make it work on PPC BE
Jan Friesse [Fri, 13 May 2016 15:06:09 +0000 (17:06 +0200)]
schedwrk: Cleanup and make it work on PPC BE

Schedwrk is passing hdb handle (64-bit) to
totempg_callback_token_create as a context. Context is defined to be
pointer, so there is conversion function which stores 64-bit hdb_handle
into pointer. Potentially, pointer can be 32-bit. This means, check
part of hdb is discarded (and have to get special no_check value in
schedwrk_do) later. This works quite well on 32-bit Little-Endian
system. Sadly on Big-Endian system, check partition of hdb is stored
instead of value. Result is error of hdb_handle_get call.

Proposed solution is to pass handle pointer to
totempg_callback_token_create as context. This means full hdb (check +
value) can be used in schedwrk_do (easier detection of memory
corruption).

Main reason for this patch is to remove usage of pointer as integer
value.

Small drawback of given solution is that handle pointer must be memory
allocated on heap or static memory, making API more bug-prone. Current
usage of schedwrk API across corosync always use memory in .text
section (safe), so it's not a problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
8 years agocmapctl: Handle corosync errors in print_key func
Jan Friesse [Tue, 17 May 2016 10:04:13 +0000 (12:04 +0200)]
cmapctl: Handle corosync errors in print_key func

print_key handles only CS_ERR_TRY_AGAIN error. If different error is
returned, print_key loops forewer.

Solution is to handle all errors.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
8 years agoAdds doxygen stubs to include directory
Michael Jones [Sat, 30 Apr 2016 00:02:41 +0000 (20:02 -0400)]
Adds doxygen stubs to include directory

Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoAdd clang-format configuration file
Michael Jones [Fri, 29 Apr 2016 22:50:10 +0000 (18:50 -0400)]
Add clang-format configuration file

This .clang-format file is written for clang-format version 3.7.1

I've attempted to set the options for clang-format so that the
difference between the current code, and the result of the clang format
call is as small as possible.

Unfortunately, clang-format doesn't yet have the ability to handle every
single possible formatting option, so it's not perfect yet.

Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agowd: make watchdog device configurable
Valentin Vidic [Tue, 3 May 2016 07:05:54 +0000 (09:05 +0200)]
wd: make watchdog device configurable

Add configuration option resources.watchdog_device allowing runtime
selection of watchdog device.  Useful for newer servers having more
than one watchdog available (IPMI and iTCO).

Special value "off" disables watchdog in configuration rather than
just using build options.  Useful when watchdog device is needed
elsewhere (SBD cluster stonith service).

Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agologging: Use our own version of basename
Christine Caulfield [Tue, 3 May 2016 10:05:02 +0000 (11:05 +0100)]
logging: Use our own version of basename

basename() function has some potentially odd issues on
other platforms.

So, to be safe, here's an internal version.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agologsys: fix TOTEM logging when corosync built out of tree
Christine Caulfield [Tue, 26 Apr 2016 08:49:53 +0000 (09:49 +0100)]
logsys: fix TOTEM logging when corosync built out of tree

If corosync is built out-of-tree (passing --srcdir to configure) then
TOTEM logging doesn't print anything.

This is caused by the source filenames (from __FILE__ at compilation
time) having the configured path in them - in this example
../corosync/exec/totemudp.c etc. The list of totem source filenames
passed to libqb logging facility only has the basenames so the filenames
never match up as libqb does an exact string match.

I looked into fixing this in libqb but it causes a regression. We can't
simply basename() __FILE__ at the point of calling log_printf as it's i
common also to use __FILE__ to generate the logging source, and
using basename() on both removes the distinction between similarly named
files from different directories which could be a requirement.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
8 years agoparser: Make config file parser more hierarchy
Christine Caulfield [Thu, 21 Apr 2016 12:47:47 +0000 (13:47 +0100)]
parser: Make config file parser more hierarchy

pass 'state' down the stack so that the state of the
hierarchy doesn't get lost when there are unexpected items
in the config hierarchy.

Don't bother setting 'state' on SECTION_END as there's no point
now we're going back up the stack.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agototemconfig: Explicitly pass IP version
Jan Friesse [Wed, 6 Apr 2016 13:49:09 +0000 (15:49 +0200)]
totemconfig: Explicitly pass IP version

If resolver was set to prefer IPv6 (almost always) and interface section
was not defined (almost all config files created by pcs), IP version was
set to mcast_addr.family. Because mcast_addr.family was unset (reset to
zero), IPv6 address was returned causing failure in totemsrp.
Solution is to pass correct IP version stored in
totem_config->ip_version.

Patch also simplifies get_cluster_mcast_addr. It was using mix of
explicitly passed IP version and bindnet IP version.

Also return value of get_cluster_mcast_addr is now properly checked.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
8 years agocpg: Handle ipc error in cpg_zcb_alloc/free
Jan Friesse [Wed, 24 Feb 2016 15:02:31 +0000 (16:02 +0100)]
cpg: Handle ipc error in cpg_zcb_alloc/free

- Error returned by coroipcc_msg_send_reply_receive is now correctly
  handled.
- If munmap fails, error is set to proper value and handle is put back
  into handle_db

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
8 years agocpg: Memory not unmapped in cpg_zcb_free
Athira Rajeev [Wed, 24 Feb 2016 13:15:31 +0000 (18:45 +0530)]
cpg: Memory not unmapped in cpg_zcb_free

Function in cpg_zcb_alloc (from code lib/cpg.c) creates
/dev/shm/corosync_zerocopy-XXXXX and does mmap

The memory is allocated by corosync service (function zcb_alloc
in exec/cpg.c) also and both shares this memory via mmap
(uses MAP_SHARED in mmap call)

Corosync calls unlink which deletes the file from /dev/shm while
closing the file descriptor, but unmap is not happening correctly
while calling cpg_zcb_free.

So:
- still the deleted file holds the memory
- As munmap is not happening correctly, the number of mappings per
  process gets exceeded and corosync dies with ENOMEM

From gdb, the size passed to munmap appears to be zero and address
looks wrong. Also in the code return code of munmap is not checked.

The patch adds check for:
-  munmap return code and getting correct address for munmap

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agototempg: Fix memory leak
Jan Friesse [Wed, 10 Feb 2016 11:36:52 +0000 (12:36 +0100)]
totempg: Fix memory leak

Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.

Solution is to have only one free list.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <stdake@cisco.com>
8 years agoFix spelling error in binary corosync
Richard B Winters [Thu, 23 Apr 2015 20:46:58 +0000 (16:46 -0400)]
Fix spelling error in binary corosync

 - Changed paramater to parameter in exec/logcconfig.c

Change-Id: I8a24b0ef5c6621dc6c19d7decbdfe7a255afd10d
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoFix spelling error in binary corosync-cfgtool
Richard B Winters [Thu, 23 Apr 2015 20:40:45 +0000 (16:40 -0400)]
Fix spelling error in binary corosync-cfgtool

 - Changed reenable to re-enable in tools/corosync-cfgtool.c

Change-Id: I0457bf3040a454a44f0d8343dd2cd8bf8fad16e0
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoFix spelling error in manual sam_overview 8
Richard B Winters [Thu, 23 Apr 2015 20:36:21 +0000 (16:36 -0400)]
Fix spelling error in manual sam_overview 8

 - Changed usefull to useful

Change-Id: I2d7872b21e889202cd2b7752db4c76f18fffa95d
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agocmap_keys.8: Fix spelling and grammar errors
Jan Friesse [Wed, 27 Jan 2016 17:22:36 +0000 (18:22 +0100)]
cmap_keys.8: Fix spelling and grammar errors

- "There are informations" changed to "There is information"
- Other occurrences of informations changed to information

Original patch was created by Richard B Winters <rik@mmogp.com>, so
thanks for it.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
8 years agoFix spelling errors in manual corosync.conf 5
Richard B Winters [Thu, 23 Apr 2015 20:24:51 +0000 (16:24 -0400)]
Fix spelling errors in manual corosync.conf 5

 - dont to don't
 - overriden to overridden
 - informations to information

Change-Id: If6644694d750c30ba9f5f43b4eb852485613d64a
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoFix grammer error in manual votequorum_trackstart
Richard B Winters [Thu, 23 Apr 2015 20:18:27 +0000 (16:18 -0400)]
Fix grammer error in manual votequorum_trackstart

"allows to" was updated to read "allows one to"

- With a subject it's grammatically correct.

Change-Id: I9559e31c780e211b651744c6eaa056ce8d4c3db1
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoAdd section in manual title for cpg_zcb_free 3
Richard B Winters [Thu, 23 Apr 2015 20:13:02 +0000 (16:13 -0400)]
Add section in manual title for cpg_zcb_free 3

Change-Id: Ib80face38dce0345e649297d16cf8a63c5b0e8c1
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoAdd section in manual title for cpg_zcb_alloc 3
Richard B Winters [Thu, 23 Apr 2015 20:11:36 +0000 (16:11 -0400)]
Add section in manual title for cpg_zcb_alloc 3

Change-Id: I8c5d6af915203533c80e4eaa574e305a46d74815
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoFix incorrect spelling of retrieve from retreive
Richard B Winters [Thu, 23 Apr 2015 19:48:57 +0000 (15:48 -0400)]
Fix incorrect spelling of retrieve from retreive

Corrected the spelling of retrieve, where it was spelled as retreive.

 - There were two cases of this mispelling; one
   upper-case and one lower-case

Change-Id: Ic97fd210d8d3ae7e568e5a2e5d97c6220d2ff628
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoUpdate corosync.spec source link
Jan Friesse [Tue, 5 Jan 2016 16:11:07 +0000 (17:11 +0100)]
Update corosync.spec source link

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
8 years agoUpdate gitignore files
Jan Friesse [Tue, 5 Jan 2016 16:03:46 +0000 (17:03 +0100)]
Update gitignore files

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
8 years agoRemove all links to old ML
Jan Friesse [Tue, 5 Jan 2016 16:02:43 +0000 (17:02 +0100)]
Remove all links to old ML

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
8 years agototemsrp: Fix clang warning (tautological compare)
Ruben Kerkhof [Fri, 18 Dec 2015 18:55:06 +0000 (18:55 +0000)]
totemsrp: Fix clang warning (tautological compare)

gsfrom is always >= 0

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoconfigure.ac: Make location of .pc overrideable
Ruben Kerkhof [Fri, 18 Dec 2015 20:34:19 +0000 (20:34 +0000)]
configure.ac: Make location of .pc overrideable

FreeBSD stores them in /usr/local/libdata/pkgconfig

This allows us to remove some local hooks in the process.

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoRemove a few unused variables and functions
Ruben Kerkhof [Fri, 18 Dec 2015 18:56:17 +0000 (18:56 +0000)]
Remove a few unused variables and functions

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoconfigure.ac: We don't need no C++ compiler
Ruben Kerkhof [Fri, 18 Dec 2015 14:21:04 +0000 (14:21 +0000)]
configure.ac: We don't need no C++ compiler

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoconfigure.ac: Remove deprecated AC_PROG_LIBTOOL
Ruben Kerkhof [Fri, 18 Dec 2015 14:06:09 +0000 (14:06 +0000)]
configure.ac: Remove deprecated AC_PROG_LIBTOOL

AC_PROG_LIBTOOL is deprecated version of LT_INIT. Because LT_INIT is
called we can remove it.

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoconfigure.ac: Remove AC_PROG_RANLIB
Ruben Kerkhof [Fri, 18 Dec 2015 14:05:31 +0000 (14:05 +0000)]
configure.ac: Remove AC_PROG_RANLIB

It was obsoleted by libtool and we don't use ranlib standalone.

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoconfigure.ac: make foreign apply to all Makefiles
Ruben Kerkhof [Fri, 18 Dec 2015 13:58:15 +0000 (13:58 +0000)]
configure.ac: make foreign apply to all Makefiles

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoRemove unused, obsolete check
Ruben Kerkhof [Fri, 18 Dec 2015 13:45:18 +0000 (13:45 +0000)]
Remove unused, obsolete check

From autoconf info Obsolete Macros:

"These days, it is portable to assume C89, and that signal
handlers return void, without needing to use this macro or RETSIGTYPE."

And we indeed assume so.

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoFix detection of qb_log_thread_priority_set
Ruben Kerkhof [Tue, 15 Dec 2015 21:31:23 +0000 (22:31 +0100)]
Fix detection of qb_log_thread_priority_set

This fixes detection of libqb function qb_log_thread_priority_set
if it was installed outside of the standard library search
path, in my case /opt.

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agocpghum: Fix type of recv_crc
Ruben Kerkhof [Tue, 15 Dec 2015 11:31:28 +0000 (12:31 +0100)]
cpghum: Fix type of recv_crc

Fixes build on FreeBSD which doesn't have ulong

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoCheck for fdatasync
Ruben Kerkhof [Tue, 15 Dec 2015 11:24:16 +0000 (12:24 +0100)]
Check for fdatasync

If we don't have it, fall back to fsync

Fixes the build on FreeBSD

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoFix detection of warning flags for clang
Ruben Kerkhof [Fri, 23 Jan 2015 01:14:14 +0000 (02:14 +0100)]
Fix detection of warning flags for clang

Using ./configure CC=clang, the following flags are detected
as supported:

checking whether clang supports "-Wgnu89-inline"... yes
checking whether clang supports "-Wno-strict-aliasing"... yes

Which results in a lot of warnings during make:

warning: unknown warning option '-Wunsigned-char'
[-Wunknown-warning-option]
warning: unknown warning option '-Wgnu89-inline'
[-Wunknown-warning-option]

Clang doesn't support these flags, but the compile check returns a
warning, not an error:

configure:16649: checking whether clang supports "-Wunsigned-char"
configure:16662: clang -E  -Wunsigned-char conftest.c
warning: unknown warning option '-Wunsigned-char'
[-Wunknown-warning-option]
1 warning generated.
configure:16662: $? = 0
configure:16663: result: yes

Use -Wunknown-warning-option -Werror if supported

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agoquorum: Display node id as unsigned int.
Hideo Yamauchi [Thu, 19 Nov 2015 23:53:17 +0000 (08:53 +0900)]
quorum: Display node id as unsigned int.

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agocts: InitClusterManager is now BootCluster
Jan Friesse [Mon, 23 Nov 2015 10:08:31 +0000 (11:08 +0100)]
cts: InitClusterManager is now BootCluster

This is forward port of flatiron-cts
fbe1721e676eafd1f25f470234b646904f54e3f3.

Thanks to bliu <bliu@suse.com> for pointing out.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
8 years agototemudp: Move udp bind() so that multicast works with IPv6
Christine Caulfield [Mon, 16 Nov 2015 16:00:36 +0000 (16:00 +0000)]
totemudp: Move udp bind() so that multicast works with IPv6

It seems that the IPv6 multicast parameters only take effect when bind()
is called, so I've moved the mcast recv socket bind() to the bottom of
totemudp_build_sockets_ip().

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agocfgtool: Display nodeid as unsigned int
Hideo Yamauchi [Fri, 13 Nov 2015 05:16:19 +0000 (14:16 +0900)]
cfgtool: Display nodeid as unsigned int

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
8 years agovotequorum: Don't send multiple callbacks when nodes join
Christine Caulfield [Thu, 22 Oct 2015 10:45:26 +0000 (11:45 +0100)]
votequorum: Don't send multiple callbacks when nodes join

This patch aligns the votequorum callbacks so that they are
the same as the quorum ones. Previously it was quite common
for votequorum to send one callback for every node in the cluster
when a single new node joined (because it sent one for every
nodeinfo message it received).

This new system makes much more sense in itself and being
consistent with the internal quorum is also an advantage!

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoman: Add synopsis for cpg_zcb_alloc and free
Ferenc Wágner [Fri, 28 Aug 2015 13:10:24 +0000 (15:10 +0200)]
man: Add synopsis for cpg_zcb_alloc and free

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoman html index: Update index
Ferenc Wágner [Fri, 28 Aug 2015 12:21:18 +0000 (14:21 +0200)]
man html index: Update index

- add link to cmap_keys(8)
- remove link to cpg_groups_get(3)
- add missing cpg_* and votequorum_qdevice_* functions
- corosync-fplay has already been removed by ab32894

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agovotequorum: Make sure cs_error_t is defined
Ferenc Wágner [Thu, 27 Aug 2015 12:32:02 +0000 (14:32 +0200)]
votequorum: Make sure cs_error_t is defined

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoClose Doxygen group in include/corosync/cmap.h
Ferenc Wágner [Thu, 13 Aug 2015 12:06:32 +0000 (14:06 +0200)]
Close Doxygen group in include/corosync/cmap.h

This avoids warning: end of file while inside a group.

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoDoxygen fix for cmap_iter_next()
Ferenc Wágner [Thu, 13 Aug 2015 10:57:01 +0000 (12:57 +0200)]
Doxygen fix for cmap_iter_next()

Remove the extra cmap_ prefix of the iter_handle parameter.

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoconfigure: Correct help entry for logdir
Ferenc Wágner [Thu, 13 Aug 2015 09:33:54 +0000 (11:33 +0200)]
configure: Correct help entry for logdir

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agototmesrp: Fix typo in log message
Ferenc Wágner [Thu, 13 Aug 2015 10:46:28 +0000 (12:46 +0200)]
totmesrp: Fix typo in log message

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoconfigure: typo in include
Ferenc Wágner [Thu, 13 Aug 2015 10:34:25 +0000 (12:34 +0200)]
configure: typo in include

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoman page: Correct option letter for DBus
Ferenc Wágner [Tue, 28 Jul 2015 13:31:36 +0000 (15:31 +0200)]
man page: Correct option letter for DBus

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agowd: fix setting of watchdog timeouts
Christine Caulfield [Tue, 14 Jul 2015 09:04:06 +0000 (10:04 +0100)]
wd: fix setting of watchdog timeouts

Fix setting of initial watchdog timeout, and also changing of timeout.

Remove redundant starting of timer in exec_init_fn

Signed-off-by: Kazunori INOUE <kazunori.inoue3@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoCFG: Prevent CFG orignating messages during SYNC
Jason HU [Sun, 28 Jun 2015 16:16:06 +0000 (16:16 +0000)]
CFG: Prevent CFG orignating messages during SYNC

During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
send totem messages. This may corrupts
assembly_list_inuse/assembly_list_free if those messages are recedived
after SYNC is done.

The solution is marking related CFG APIs as
CS_LIB_FLOW_CONTROL_REQUIRED.

Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoDon't link with libz when not needed v2.3.5
Jan Friesse [Mon, 22 Jun 2015 14:00:07 +0000 (16:00 +0200)]
Don't link with libz when not needed

Commit 8cc8e513633a1a8b12c416e32fb5362fcf4d65dd added check for libz
resulting in linking with lib z for all libraries. This is not expected
behavior. Patch solves it by making defining automake conditional so
cpghum is linked only if libz is available and LIBS variable is not
modified at all.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
9 years agoLog: Add logrotate configuration file
Jan Friesse [Fri, 19 Jun 2015 15:42:09 +0000 (17:42 +0200)]
Log: Add logrotate configuration file

In cman era corosync was depending on logrotate file distributed by
cman. It's good idea to logrotate also on systems without cman (new
clusters).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
9 years agoAdd note about rrp active beeing unsupported
Jan Friesse [Fri, 19 Jun 2015 14:16:18 +0000 (16:16 +0200)]
Add note about rrp active beeing unsupported

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agovotequorum: Fix auto_tie_breaker behaviour in odd-sized clusters
Christine Caulfield [Thu, 18 Jun 2015 08:57:59 +0000 (09:57 +0100)]
votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters

auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).

This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agototemsrp: Improve logging of left/down nodes
Christine Caulfield [Fri, 12 Jun 2015 15:16:45 +0000 (16:16 +0100)]
totemsrp: Improve logging of left/down nodes

This patch from Hideo Yamauchi improves the logging of
whether nodes leave the cluster cleanly or uncleanly,
making it easier to determine if a node ws shut down
by the operator. There is also the possibility that a
LEAVE message could get missed (due to the node being
in flush state) so this can also make that clearer.

The modifications are as follows.

Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE
node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agototem: Log a message if JOIN or LEAVE message is ignored
Christine Caulfield [Fri, 17 Apr 2015 14:49:53 +0000 (15:49 +0100)]
totem: Log a message if JOIN or LEAVE message is ignored

As per recent email thread, this patch adds a log message if a JOIN or
LEAVE message is discarded while corosync is flushing the receive queue.

While ignoring a JOIN message is harmless (it will be resent), ignoring
a LEAVE message can cause a longer state transition as it is treated as
a node crashing rather than leaving gracefully, so the system admin
might be confused as to the cause.

Unfortunately, we can't (at the totemudp level) distinguish between JOIN
or LEAVE messages without a lot more protocol-specific code creeping in
the lower layer so the message is left ambiguous.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agototemconfig: Check for duplicate nodeids
Christine Caulfield [Fri, 10 Apr 2015 13:22:07 +0000 (14:22 +0100)]
totemconfig: Check for duplicate nodeids

Having duplicate nodeids in corosync.conf can play havoc with a cluster,
so (as suggested by someone on this list) here is some code to check
that all nodeids are unique. Even if a nodeid is not specified it will
check to be sure that the ID generated from the IP address (ipv4 only)
does not clash with one that is provided.

It logs all non-unique nodeids to syslog, but only the last is reported
on the command-line to the user which should be enough to get them to
check further. At startup this will cause corosync to fail to start.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoquorum: don't allow quorum_trackstart to be called twice
Christine Caulfield [Mon, 16 Mar 2015 11:37:52 +0000 (11:37 +0000)]
quorum: don't allow quorum_trackstart to be called twice

If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).

As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoReally add cpghum
Jan Friesse [Tue, 10 Mar 2015 12:20:37 +0000 (13:20 +0100)]
Really add cpghum

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
9 years agocpg: Add support for messages larger than 1Mb
Christine Caulfield [Thu, 5 Mar 2015 16:45:15 +0000 (16:45 +0000)]
cpg: Add support for messages larger than 1Mb

If a cpg client sends a message larger than 1Mb (actually slightly
less to allow for internal buffers) cpg will now fragment that into
several corosync messages before sending it around the ring.

cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the
cpg membership was disrupted during the send operation and the message
needs to be resent.

The new API call cpg_max_atomic_msgsize_get() returns the maximum size
of a message that will not be fragmented internally.

New test program cpghum was written to stress test this functionality,
it checks message integrity and order of receipt.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agototemsrp: Format member list log as unsigned int
Andrey N. Groshev [Tue, 3 Mar 2015 02:56:12 +0000 (05:56 +0300)]
totemsrp: Format member list log as unsigned int

Signed-off-by: Andrey N. Groshev <greenx@yandex.ru>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoDon't allow both two_node and auto_tie_breaker in corosync.conf
Christine Caulfield [Mon, 2 Mar 2015 15:50:21 +0000 (15:50 +0000)]
Don't allow both two_node and auto_tie_breaker in corosync.conf

The two_node and auto_tie_breaker options are incompatible as they
specify conflicting methods of determining the quorate half of a cluster
partition.

This patch detects this error in corosync.conf, issues a message and
disables two_node if auto_tie_breaker is present.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoVotequorum: Fix auto_tie_breaker default
Christine Caulfield [Mon, 2 Mar 2015 15:48:01 +0000 (15:48 +0000)]
Votequorum: Fix auto_tie_breaker default

The default for auto_tie_breaker should be 'lowest' - which is what it
was before the extended ATB functionality of auto_tie_breaker_node was
added, and what the documentation states.

However this was broken so that if auto_tie_breaker_node was not
specified then auto_tie_breaker itself was ignored. This patch fixes
that.

It also fixes a typo in a comment.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoHandle adding and removing UDPU members atomically
Jan Friesse [Wed, 21 Jan 2015 12:30:48 +0000 (13:30 +0100)]
Handle adding and removing UDPU members atomically

When config file is reloaded with removed UDPU member, internal icmap
index of nodelist.node can change. This can result in removal and then
adding back node. This, with UDPU alive filtering (where member is by
default considered as not a member) makes corosync not sending messages
to such members resulting in new membership creation.

Solution is to properly test which members were really deleted and added
(instead of relying on internal and dynamic naming of icmap hash table
key name).

Also trully dynamic add and remove node (via cmap) is now handled by
same function so totem_config->interfaces is now updated properly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
9 years agocorosync_ring_id_store: Use safer permissions
Jan Friesse [Tue, 20 Jan 2015 09:24:34 +0000 (10:24 +0100)]
corosync_ring_id_store: Use safer permissions

corosync_ring_id_store should use same (safer) permissions as
corosync_ring_id_create_or_load for (eventually) newly created ringid
file.

Credit to Sjerek for finding this problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agototem: Ignore duplicated commit tokens in recovery
Jason [Sat, 10 Jan 2015 09:35:47 +0000 (17:35 +0800)]
totem: Ignore duplicated commit tokens in recovery

In active rrp mode, commit tokens are treated as mcast data messages,
thus, rrp directly delivers them to srp layer by active_mcast_recv().
This will result in duplicated commit tokens being received by srp from
different heartbeat links. If node is in recovery state and has already
sent out the initial orf token, those duplicated commit tokens will
cause message_handler_memb_commit_token() to send initial orf token
again! This is wrong because it resets the orf token content in
instance->orf_token_retransmit, which breaks the token retransmission
state.

Furthermore, by sending those initial orf tokens again and again,
it may lead active_token_recv() to drop some subsequent orf tokens.
It is OK for rrp because srp will do token retransmission,
but as said above, srp retransmission state has already been broken,
so finally we meet a "token lost in recovery state" condition caused
by software. If token timeout value is large, then it will takes long
time to create a new ring.

This can be reproduced by having two noded set to active rrp mode, with
two heartbeat links. Then with one node always on, let the other one do
stop/start again and again. It has a low probability to reproduce.
In theory, I think, the more heartbeat links used, the more easily it
can be reproduced.

This problem can be resolved by letting
message_handler_memb_commit_token() to ignore duplicated commit tokens
in recovery state if node (the ring representation) has already sent
out the initial orf token.

Different from prev take, this version do not depends on stored token
data but uses originated_orf_token in totemsrp_instance to remember
if initial orf token has been already originated for current membership.

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoLog auto-recovery of ring only once
Jan Friesse [Thu, 2 Oct 2014 12:09:42 +0000 (12:09 +0000)]
Log auto-recovery of ring only once

Make sure to log auto-recovery of ring only once. Every
MESSAGE_TYPE_RING_TEST_ACTIVATE receive is logged, but with lower
priority and more detailed information.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoSet RR priority by default
Jan Friesse [Fri, 2 Jan 2015 11:39:09 +0000 (12:39 +0100)]
Set RR priority by default

Experience with larger production clusters showed that setting RR
priority for corosync is viable for prevent random fencing, ...

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoautomake: Check minimum automake version
Jan Friesse [Fri, 2 Jan 2015 11:27:48 +0000 (12:27 +0100)]
automake: Check minimum automake version

Corosync needs automake version at least 1.11. Patch adds minimum
version check.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoReset timer_problem_decrementer on fault
Jason [Mon, 8 Dec 2014 15:24:22 +0000 (16:24 +0100)]
Reset timer_problem_decrementer on fault

After a heartbeat link's FAULTY and its auto re-enable,
active_instance->timer_problem_decrementer did not reset to zero. So in
the next timer_function_active_token_expired() round,
active_timer_problem_decrementer_start() will not be called. This will
result in that the active_instance->counter_problems of this link can
not be decreased any more. Cause rrp lose the ability to tolerate
network fluctuation.

This problem can be reproduced by the following sequence:
1) Set RRP in active mode, configure at least 2 heartbeat links.
2) Unplug one link till corosync-cfgtool -s shows it is FAULTY.
3) Re-plug this link then corosync-cfgtool -s shows it is active with
no faults.
4) Unplug this link again but quicky re-plug it before it becomes
FAULTY.
5) Finally, you can see corosync-cfgtool -s shows it is in
"Incrementing problem counter" state despite it currently is physically
healthy.

It can be solved by not forget to reset timer_problem_decrementer to
zero in active_timer_problem_decrementer_cancel().

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoconfig: Ensure mcast address/port differs for rrp
Jan Friesse [Mon, 24 Nov 2014 10:54:20 +0000 (11:54 +0100)]
config: Ensure mcast address/port differs for rrp

When using multiple interfaces, it's necessary to use different
multicast address/port pair for each interface to make
rrp work correctly. This is now checked in parser.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoconfig: Process broadcast option consistently
Jan Friesse [Mon, 24 Nov 2014 09:32:03 +0000 (10:32 +0100)]
config: Process broadcast option consistently

Broadcast option is global but in config set in interface section. When
more interfaces are defined, only broadcast from last section was used.

Solution is to use broadcast whenever at least one interface use
broadcast.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoconfig: Make sure user doesn't mix IPv6 and IPv4
Jan Friesse [Mon, 24 Nov 2014 09:25:05 +0000 (10:25 +0100)]
config: Make sure user doesn't mix IPv6 and IPv4

Checking code was there, sadly not correct, so it was possible to enter
one bindnet addr as IPv4 and second as IPv6. Fix is trivial.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoman page: Improve description of token timeout
Jan Friesse [Thu, 9 Oct 2014 14:19:39 +0000 (16:19 +0200)]
man page: Improve description of token timeout

With introduction of token_coefficient, token timeout defined in
configuration file may be no longer reflect real token timeout, what may
be confusing.

Enhanced description hopefully fix that.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agoStore configuration values used by totem to cmap
Jan Friesse [Mon, 13 Oct 2014 09:58:58 +0000 (11:58 +0200)]
Store configuration values used by totem to cmap

Some totem configuration values (like token, consensus, ...) are ether
computed or default value is used. It's hard to find out, what
value is really used.

Solution is to store values in cmap.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
9 years agomanpage: Fix English
Christine Caulfield [Mon, 13 Oct 2014 08:28:27 +0000 (10:28 +0200)]
manpage: Fix English

While I was looking at the above man page changes I thought I'd review
the rest of it. So here are some more English fixes for the cmap_keys.8
man page

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
9 years agoinit: Don't wait for ipc if corosync doesn't start
Jan Friesse [Tue, 7 Oct 2014 15:49:10 +0000 (17:49 +0200)]
init: Don't wait for ipc if corosync doesn't start

Init script now checks return code of executing corosync command. If it
fails, ipc_wait section is skipped, resulting in much faster failure of
init script.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
9 years agoAdjust MTU for IPv6 correctly
Jan Friesse [Tue, 30 Sep 2014 15:06:36 +0000 (17:06 +0200)]
Adjust MTU for IPv6 correctly

MTU for IPv6 is 20 bytes larger then IPv4. This fact was not taken into
account so IPv6 packets were larger then MTU resulting in fragmentation.

Solution is to substract correct IP header size.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years ago[crypto] fix crypto block rounding/padding calculation
Fabio M. Di Nitto [Tue, 2 Sep 2014 11:03:43 +0000 (13:03 +0200)]
[crypto] fix crypto block rounding/padding calculation

libnss is "weird" in this respect as some block sizes are hardcoded,
others need to be determined dynamically.

For AES we need to use the values we know since GetBlockSize would
return errors, for 3des (that hopefully nobody is using) the value
returned by GetBlockSize is 8, but let's use the call into libnss
to avoid possible conflicts with distro patching or older versions.

Now, given the correct block size, the old calculation simply added
block size to the hdr_size. This is not sufficient.

We use _PAD encryption methods and we need to take that into account.

_PAD is calculated given the current input buf len and rounded up
to block size boundary, then block_size is added.

Ideally we would do that on a per packet base but current transport
infrastructure doesn't allow it yet.

So round up the hdr_size to double the block_size reported by the
cipher.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agototemudpu: Send msgs to all members occasionally v2.3.4
Jan Friesse [Tue, 19 Aug 2014 14:05:34 +0000 (16:05 +0200)]
totemudpu: Send msgs to all members occasionally

To follow spec it's needed to send messages to all nodes (not only
active members) from time to time to detect merge.

This is needed in situations when totemsrp merge timer isn't running
(because there is enough messages sent by processors) to detect merge.

Example scenario:
- 3 nodes, all of them running cpgverify
- One node is isolated (iptables for example)
- Node is un-isolated

Without this commit, node will not merge as long as the cpgverify is
running.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agototemudpu: Implement member_set_active
Jan Friesse [Thu, 14 Aug 2014 14:04:57 +0000 (16:04 +0200)]
totemudpu: Implement member_set_active

Member active is used for sending "multicast" messages only to members
of ring. This reduces network load if some nodes are intentionally down.
Only regular multicast message load is reduced (messages sent by
totemudpu_mcast_noflush_send), because special messages (like hold
cancel, join message, ...) still have to be send to all members to
ensure correct behavior.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agototemrrp: Implement *_membership_changed
Jan Friesse [Wed, 13 Aug 2014 14:01:33 +0000 (16:01 +0200)]
totemrrp: Implement *_membership_changed

All *_membership_changed calls totemnet_member_set_active passing 1 as
active parameter for joined nodes and 0 for left nodes.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agototemnet: Add totemnet_member_set_active
Jan Friesse [Tue, 12 Aug 2014 13:56:08 +0000 (15:56 +0200)]
totemnet: Add totemnet_member_set_active

totemnet_member_set_active together with transport specific
member_set_active makes possible for totemnet (and more interestingly
transport) to be informed about membership changes.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agototem: Inform RRP about membership changes
Jan Friesse [Thu, 7 Aug 2014 11:15:04 +0000 (13:15 +0200)]
totem: Inform RRP about membership changes

Services are informed about membership changes, but if same information
is needed inside totemrrp or totemnet, it's impossible to gather this
information.

Patch makes this possible for now only for RRP with empty callbacks.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agoMakefile: Do not install TODO file
Jan Friesse [Mon, 25 Aug 2014 17:40:19 +0000 (19:40 +0200)]
Makefile: Do not install TODO file

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
10 years agoTODO: Remove TODO file
Jan Friesse [Mon, 25 Aug 2014 13:30:28 +0000 (15:30 +0200)]
TODO: Remove TODO file

TODO file has many problems like it's not updated regularly, it's not
updated at all in already distributed tarballs, ...

All relevant RFEs were filled at github as issues with flag "TODO file
convert" so file can finally be removed from git.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agocorosync-quorumtool: add sort options
Christine Caulfield [Fri, 22 Aug 2014 07:47:25 +0000 (08:47 +0100)]
corosync-quorumtool: add sort options

Adds a -o<a|i|n> option to corosync-votequorum so that the nodes list
can be sorted by Address, node Id or Name. The default remains IP
address.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
10 years agoYKD: Fix loading of YKD quorum module
Christine Caulfield [Mon, 18 Aug 2014 08:33:59 +0000 (09:33 +0100)]
YKD: Fix loading of YKD quorum module

Although YKD is currently unsupported, untested and decprecated it's
handy for testing things in the quorum module.

This patch allows YKD to actually load without an error. It does not fix
anything else in the service!

Also remove vsftype and its reference to YKD being the preferred and
default provider from the corosync.conf man page,
as that hasn't been true for a considerable time.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
10 years agoquorumtool: Sort output by nodeid
Christine Caulfield [Fri, 15 Aug 2014 07:18:07 +0000 (08:18 +0100)]
quorumtool: Sort output by nodeid

corosync-quorumtool prints the node listing by IP address
(as passed back to it from corosync) but this can be
counter-intuitive if the node IDs aren't in the same
order as the IP addresses. This patch sorts the nodes
by node ID so that the output is easier for humans to
parse.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-By: Jan Friesse <jfriesse@redhat.com>
10 years agovotequorum: Add cmap key to reset wait_for_all
Christine Caulfield [Tue, 12 Aug 2014 15:02:46 +0000 (16:02 +0100)]
votequorum: Add cmap key to reset wait_for_all

It's possible in a two_node cluster (and others but it's more likely
with just two) that a node could be booted up after downtime or failure
and the other node is not available for some reason. In this case it
would not be allowed to proceed because wait_for_all is enforced.

This patch provides a cmap key to clear this flag in the desperate
situation where that becomes necessary. It should only be used with
extreme caution and will be wrapped up in pcs which should also check
that fencing has been run.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
10 years agoCancel token holding while in retransmition
Jason HU [Fri, 8 Aug 2014 22:44:31 +0000 (06:44 +0800)]
Cancel token holding while in retransmition

When there is no other activty on ring but only retransmition, and
token is in hold mode, the retransmition will become slow. More over,
if the retransmition is always fail but token rotation works well, then
it takes quite a lone time
(fail_to_recv_const * token_hold = 2500 * 180ms = 450sec) for the
retransmit requester to meet the "FAILED TO RECEIVE" condition to
re-construct a new ring.

This problem can be solved by checking if retransmits are present
before going into hold. If a node is the retransmit requester or
the resender, it set my_token_held to 0 to speed up retransmition
and omit further unnecessary sending of token_hold_cancel signal.

Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agovotequorum: Make qdev timeout in sync configurable
Jan Friesse [Tue, 5 Aug 2014 09:59:22 +0000 (11:59 +0200)]
votequorum: Make qdev timeout in sync configurable

Configuration option quorum.device.sync_timeout is available for setting
qdevice poll timeout for synchronization phase. Default value is 30
sec.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agotestvotequorum2: Opt for polling with old ringid
Jan Friesse [Mon, 4 Aug 2014 14:19:00 +0000 (16:19 +0200)]
testvotequorum2: Opt for polling with old ringid

Option -F is added to force sending old ringid for given number of
times. Option is useful for testing failure scenario during corosync
synchronization phase.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agovotequorum: Block sync until qdevice poll
Jan Friesse [Mon, 4 Aug 2014 13:54:09 +0000 (15:54 +0200)]
votequorum: Block sync until qdevice poll

If qdevice is registered a alive, corosync waits in sync phase until
timeout expires or qdevice votes with correct nodeid parameter.

This gives qdevice time to decide to vote or not undisturbed and without
time hazard.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agoipc: Process votequorum messages during sync
Jan Friesse [Thu, 31 Jul 2014 15:06:32 +0000 (17:06 +0200)]
ipc: Process votequorum messages during sync

This is needed for qdevice to be able to process messages during
synchronization phase.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agovotequorum: Add ring id to poll call
Jan Friesse [Wed, 30 Jul 2014 14:48:19 +0000 (16:48 +0200)]
votequorum: Add ring id to poll call

If votequorum service receives incorrect (not current) ringid, call is
ignored and CS_ERR_MESSAGE_ERROR is returned.

This and previous commits makes incompatible changes in votequorum
API/ABI, so library version is increased.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agovotequorum: Return current ring id in callback
Jan Friesse [Tue, 29 Jul 2014 14:39:10 +0000 (16:39 +0200)]
votequorum: Return current ring id in callback

Returning ring id will be used in poll function.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
10 years agototemconfig: Make sure join timeout is less than consensus
Christine Caulfield [Fri, 25 Jul 2014 07:24:02 +0000 (08:24 +0100)]
totemconfig: Make sure join timeout is less than consensus

The thesis contains this paragraph:

" The Join timeout is shorter than the Consensus timeout and is used to
  increase the probability that Join messages from all currently
  working processors are received during a single round of consensus."

Empirically I can confirm that making join less than consensus can cause
havoc with a cluster so I think we should enforce this.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>