The corosync message "A processor joined or left the membership" is
vague and unhelpful. People have to look for the following quorum
message and try to deduce which nodes have joined or left from that
and past membership messages, even though the routine printing the
message already has this information to hand.
This patch fixes that message so that it prints the nodeids of the nodes
that have joined/left the cluster.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com> Reviewed-By: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Thu, 20 Jun 2013 09:59:50 +0000 (11:59 +0200)]
Log: Output parse errors to syslog
When corosync was started in daemon mode and there was parse error, no
way existed how to find out what happened (this is usual situation with
systemd enabled systems). Solution seems to be output to syslog by
default.
Also redundant line with setting logsys is removed because it's no
longer needed, because FORK and THREADED mode options has no longer
effect. FORK is handled by libqb by default and THREADED mode is forced
by calling logsys_thread_start.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Jan Friesse [Wed, 12 Jun 2013 14:09:26 +0000 (16:09 +0200)]
quorumtool: Properly check nodeid cli param
Return value of strtol can be negative, but result was assigned to
unsigned integer. To make check correct, result is first assigned to
signed variable, checked, and then assigned to unsigned variable.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jan Friesse [Mon, 20 May 2013 13:54:02 +0000 (15:54 +0200)]
Remove unnecessary mmap in cpg
Code for zero-copy in cpg does following mmaps:
- Mmap anonymous, private memory to some address (-> malloc)
- Mmap shared memory of fd to address returned by first mmap
(effectively shadows first mapping)
This is not necessary and only one mapping is needed.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jan Friesse [Mon, 8 Apr 2013 07:57:25 +0000 (09:57 +0200)]
Detect big scheduling pauses
Add poll timer scheduler to be called 3 times per token timeout.
If poll timer was not called for more then 0.8 * token timeout, it means
corosync process was not scheduled and ether token_timeout should be
increased or load should be reduced (useful for VM, where host is
overcommitted so VM is not scheduled as expected).
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Xia Li [Tue, 19 Mar 2013 07:08:13 +0000 (07:08 +0000)]
Convert the nodeid byte order to be aligned with network order
When using corosync with clear_node_high_bit setting to yes,
the highest bit is cleared. When all the cluster nodes are in
one subnet, we probably configure the IP addresses as follows:
node1: 147.2.207.64
node2: 147.2.207.192
If the byte order of the nodeid is little endian, wiping off the
highest bit will make the two nodes have the same nodeid!
This patch fixes this by converting the nodeid to network order.
Signed-off-by: Xia Li <xli@suse.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Michael Chapman [Mon, 11 Feb 2013 03:47:27 +0000 (03:47 +0000)]
build: make --disable-testagents work
The --disable-testagents option sets enable_testagents to "no". This
variable should always be explicitly tested against "yes", not just
that it is non-empty.
Signed-off-by: Michael Chapman <mike@very.puzzling.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Thu, 31 Jan 2013 13:56:18 +0000 (14:56 +0100)]
Handle unexpected closing brace in config file
If configuration file contains closing brace before opening brace
at top level, configuration parsing is stopped and file is not
completely parsed. Solution is to detect extra closing brace and display
error.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jan Friesse [Wed, 30 Jan 2013 12:40:52 +0000 (13:40 +0100)]
Handle colon in configuration file
If colon was entered as part of value on end of value, it is deleted.
This makes impossible to enter (legal) IPv6 address ending with :: (like
fed0::).
Also when line contains both brace and colon, it is parsed twice (first
as key = value and second as start of section). This is handled by
continue in if section.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jan Friesse [Wed, 12 Dec 2012 08:25:18 +0000 (09:25 +0100)]
Move qb_loop creation after daemonization
Creating qb_loop before daemonization is not problem for poll or epoll
type loops, but it's problem for kqueue, because kqueue is not shared
in child with parent after fork.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jan Friesse [Wed, 7 Nov 2012 16:51:15 +0000 (17:51 +0100)]
Add waiting_trans_ack also to fragmentation layer
Patch for support waiting_trans_ack may fail if there is synchronization
happening between delivery of fragmented message. In such situation,
fragmentation layer is waiting for message with correct number, but it
will never arrive.
Solution is to handle (callback) change of waiting_trans_ack and use
different queue.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Steven Dake [Wed, 7 Nov 2012 15:45:12 +0000 (16:45 +0100)]
Fix problem with sync operations under very rare circumstances
This patch creates a special message queue for synchronization messages.
This prevents a situation in which messages are queued in the
new_message_queue but have not yet been originated from corrupting the
synchronization process.
Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jan Friesse [Wed, 24 Oct 2012 10:08:40 +0000 (10:08 +0000)]
If failed_to_recv is set, consensus can be empty
If failed_to_recv is set (node detect itself not able to receive
message), we can end up with assert, because my_failed_list and
my_member_list are same list. This is happening because we are not
following specification and we allow to mark node itself as failed.
Because if failed_to_recv is set and we reached consensus across nodes,
single node membership is created (ignoring both fail list and
member_list), we can skip assert.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Jacek Konieczny [Thu, 25 Oct 2012 07:44:57 +0000 (07:44 +0000)]
link libtotem_pg to libqb
The libtotem_pg library uses symbols from libqb, so it should be
explicitely linked with it. This doesn't cause problems for corosync
binary itself, as it is linked to both libraries, but can cause
problems if anything else links to libtotem_pg.so and automated
checkers can show this as a library problem.
Signed-off-by: Jacek Konieczny <jajcus@jajcus.net> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Wed, 17 Oct 2012 12:50:09 +0000 (14:50 +0200)]
Correctly check if service was unloaded
my_processing_idx is pointer to received service list, instead of global
service number. If we check state of service we should use service_id
instead of my_processing_idx.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>