Jan Friesse [Fri, 2 Oct 2015 06:54:05 +0000 (08:54 +0200)]
qnetd: Improvements
- Move adding client to cluster to init phase instead of preinit
- Implement missing ask for vote and vote info messages
- Fix cluster name memory leak
- Refactor unexpected message handler to one generic function
- Move qnetd_client_send_err to new file
- Add qnetd_client_send_vote_info
Jan Friesse [Thu, 1 Oct 2015 07:58:01 +0000 (09:58 +0200)]
Qnetd improvements
- Complete config and membership node list callbacks
- Add client disconnect callback
- Always send msg_seq_num in node list
- Store config and last membership node list
votequorum: Make display of qdevice more intuitive.
corosync-quorumtool displays the votes of the qdevice whether
or not it is active. This is confusing because if it is not active
then the display looks like there is a vote being contributed to
quorum when there is not.
This patch displays 0 for qdevice votes if the device is present
(but inactive) and adds the votes after the name. If the device is
contributing votes then they are displayed as normal.
In my previous logconfig patch, adding a subsys so the
logging stanzas could disable logging to a file, because
the subsys closed the file used by the main logging.
This patch only applies defaults to higher-level logging and
non-deprecated keys.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Fri, 13 May 2016 15:06:09 +0000 (17:06 +0200)]
schedwrk: Cleanup and make it work on PPC BE
Schedwrk is passing hdb handle (64-bit) to
totempg_callback_token_create as a context. Context is defined to be
pointer, so there is conversion function which stores 64-bit hdb_handle
into pointer. Potentially, pointer can be 32-bit. This means, check
part of hdb is discarded (and have to get special no_check value in
schedwrk_do) later. This works quite well on 32-bit Little-Endian
system. Sadly on Big-Endian system, check partition of hdb is stored
instead of value. Result is error of hdb_handle_get call.
Proposed solution is to pass handle pointer to
totempg_callback_token_create as context. This means full hdb (check +
value) can be used in schedwrk_do (easier detection of memory
corruption).
Main reason for this patch is to remove usage of pointer as integer
value.
Small drawback of given solution is that handle pointer must be memory
allocated on heap or static memory, making API more bug-prone. Current
usage of schedwrk API across corosync always use memory in .text
section (safe), so it's not a problem.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Michael Jones [Fri, 29 Apr 2016 22:50:10 +0000 (18:50 -0400)]
Add clang-format configuration file
This .clang-format file is written for clang-format version 3.7.1
I've attempted to set the options for clang-format so that the
difference between the current code, and the result of the clang format
call is as small as possible.
Unfortunately, clang-format doesn't yet have the ability to handle every
single possible formatting option, so it's not perfect yet.
Signed-off-by: Michael Jones <jonesmz@jonesmz.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Valentin Vidic [Tue, 3 May 2016 07:05:54 +0000 (09:05 +0200)]
wd: make watchdog device configurable
Add configuration option resources.watchdog_device allowing runtime
selection of watchdog device. Useful for newer servers having more
than one watchdog available (IPMI and iTCO).
Special value "off" disables watchdog in configuration rather than
just using build options. Useful when watchdog device is needed
elsewhere (SBD cluster stonith service).
Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
logsys: fix TOTEM logging when corosync built out of tree
If corosync is built out-of-tree (passing --srcdir to configure) then
TOTEM logging doesn't print anything.
This is caused by the source filenames (from __FILE__ at compilation
time) having the configured path in them - in this example
../corosync/exec/totemudp.c etc. The list of totem source filenames
passed to libqb logging facility only has the basenames so the filenames
never match up as libqb does an exact string match.
I looked into fixing this in libqb but it causes a regression. We can't
simply basename() __FILE__ at the point of calling log_printf as it's i
common also to use __FILE__ to generate the logging source, and
using basename() on both removes the distinction between similarly named
files from different directories which could be a requirement.
Jan Friesse [Wed, 6 Apr 2016 13:49:09 +0000 (15:49 +0200)]
totemconfig: Explicitly pass IP version
If resolver was set to prefer IPv6 (almost always) and interface section
was not defined (almost all config files created by pcs), IP version was
set to mcast_addr.family. Because mcast_addr.family was unset (reset to
zero), IPv6 address was returned causing failure in totemsrp.
Solution is to pass correct IP version stored in
totem_config->ip_version.
Patch also simplifies get_cluster_mcast_addr. It was using mix of
explicitly passed IP version and bindnet IP version.
Also return value of get_cluster_mcast_addr is now properly checked.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Jan Friesse [Wed, 24 Feb 2016 15:02:31 +0000 (16:02 +0100)]
cpg: Handle ipc error in cpg_zcb_alloc/free
- Error returned by coroipcc_msg_send_reply_receive is now correctly
handled.
- If munmap fails, error is set to proper value and handle is put back
into handle_db
Athira Rajeev [Wed, 24 Feb 2016 13:15:31 +0000 (18:45 +0530)]
cpg: Memory not unmapped in cpg_zcb_free
Function in cpg_zcb_alloc (from code lib/cpg.c) creates
/dev/shm/corosync_zerocopy-XXXXX and does mmap
The memory is allocated by corosync service (function zcb_alloc
in exec/cpg.c) also and both shares this memory via mmap
(uses MAP_SHARED in mmap call)
Corosync calls unlink which deletes the file from /dev/shm while
closing the file descriptor, but unmap is not happening correctly
while calling cpg_zcb_free.
So:
- still the deleted file holds the memory
- As munmap is not happening correctly, the number of mappings per
process gets exceeded and corosync dies with ENOMEM
From gdb, the size passed to munmap appears to be zero and address
looks wrong. Also in the code return code of munmap is not checked.
The patch adds check for:
- munmap return code and getting correct address for munmap
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Wed, 10 Feb 2016 11:36:52 +0000 (12:36 +0100)]
totempg: Fix memory leak
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.
Solution is to have only one free list.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Steven Dake <stdake@cisco.com>
- Changed paramater to parameter in exec/logcconfig.c
Change-Id: I8a24b0ef5c6621dc6c19d7decbdfe7a255afd10d Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
- Changed reenable to re-enable in tools/corosync-cfgtool.c
Change-Id: I0457bf3040a454a44f0d8343dd2cd8bf8fad16e0 Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Change-Id: I2d7872b21e889202cd2b7752db4c76f18fffa95d Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
- dont to don't
- overriden to overridden
- informations to information
Change-Id: If6644694d750c30ba9f5f43b4eb852485613d64a Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Change-Id: I9559e31c780e211b651744c6eaa056ce8d4c3db1 Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Change-Id: Ib80face38dce0345e649297d16cf8a63c5b0e8c1 Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Change-Id: I8c5d6af915203533c80e4eaa574e305a46d74815 Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Corrected the spelling of retrieve, where it was spelled as retreive.
- There were two cases of this mispelling; one
upper-case and one lower-case
Change-Id: Ic97fd210d8d3ae7e568e5a2e5d97c6220d2ff628 Signed-off-by: Richard B Winters <rik@mmogp.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totemudp: Move udp bind() so that multicast works with IPv6
It seems that the IPv6 multicast parameters only take effect when bind()
is called, so I've moved the mcast recv socket bind() to the bottom of
totemudp_build_sockets_ip().
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
votequorum: Don't send multiple callbacks when nodes join
This patch aligns the votequorum callbacks so that they are
the same as the quorum ones. Previously it was quite common
for votequorum to send one callback for every node in the cluster
when a single new node joined (because it sent one for every
nodeinfo message it received).
This new system makes much more sense in itself and being
consistent with the internal quorum is also an advantage!
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Ferenc Wágner [Fri, 28 Aug 2015 12:21:18 +0000 (14:21 +0200)]
man html index: Update index
- add link to cmap_keys(8)
- remove link to cpg_groups_get(3)
- add missing cpg_* and votequorum_qdevice_* functions
- corosync-fplay has already been removed by ab32894
Signed-off-by: Ferenc Wágner <wferi@niif.hu> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jason HU [Sun, 28 Jun 2015 16:16:06 +0000 (16:16 +0000)]
CFG: Prevent CFG orignating messages during SYNC
During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
send totem messages. This may corrupts
assembly_list_inuse/assembly_list_free if those messages are recedived
after SYNC is done.
The solution is marking related CFG APIs as
CS_LIB_FLOW_CONTROL_REQUIRED.
Signed-off-by: Jason HU <huzhijiang@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Mon, 22 Jun 2015 14:00:07 +0000 (16:00 +0200)]
Don't link with libz when not needed
Commit 8cc8e513633a1a8b12c416e32fb5362fcf4d65dd added check for libz
resulting in linking with lib z for all libraries. This is not expected
behavior. Patch solves it by making defining automake conditional so
cpghum is linked only if libz is available and LIBS variable is not
modified at all.
votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters
auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).
This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This patch from Hideo Yamauchi improves the logging of
whether nodes leave the cluster cleanly or uncleanly,
making it easier to determine if a node ws shut down
by the operator. There is also the possibility that a
LEAVE message could get missed (due to the node being
in flush state) so this can also make that clearer.
The modifications are as follows.
Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE
node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.
totem: Log a message if JOIN or LEAVE message is ignored
As per recent email thread, this patch adds a log message if a JOIN or
LEAVE message is discarded while corosync is flushing the receive queue.
While ignoring a JOIN message is harmless (it will be resent), ignoring
a LEAVE message can cause a longer state transition as it is treated as
a node crashing rather than leaving gracefully, so the system admin
might be confused as to the cause.
Unfortunately, we can't (at the totemudp level) distinguish between JOIN
or LEAVE messages without a lot more protocol-specific code creeping in
the lower layer so the message is left ambiguous.
Having duplicate nodeids in corosync.conf can play havoc with a cluster,
so (as suggested by someone on this list) here is some code to check
that all nodeids are unique. Even if a nodeid is not specified it will
check to be sure that the ID generated from the IP address (ipv4 only)
does not clash with one that is provided.
It logs all non-unique nodeids to syslog, but only the last is reported
on the command-line to the user which should be enough to get them to
check further. At startup this will cause corosync to fail to start.
quorum: don't allow quorum_trackstart to be called twice
If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).
As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If a cpg client sends a message larger than 1Mb (actually slightly
less to allow for internal buffers) cpg will now fragment that into
several corosync messages before sending it around the ring.
cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the
cpg membership was disrupted during the send operation and the message
needs to be resent.
The new API call cpg_max_atomic_msgsize_get() returns the maximum size
of a message that will not be fragmented internally.
New test program cpghum was written to stress test this functionality,
it checks message integrity and order of receipt.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
The default for auto_tie_breaker should be 'lowest' - which is what it
was before the extended ATB functionality of auto_tie_breaker_node was
added, and what the documentation states.
However this was broken so that if auto_tie_breaker_node was not
specified then auto_tie_breaker itself was ignored. This patch fixes
that.
It also fixes a typo in a comment.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jan Friesse [Wed, 21 Jan 2015 12:30:48 +0000 (13:30 +0100)]
Handle adding and removing UDPU members atomically
When config file is reloaded with removed UDPU member, internal icmap
index of nodelist.node can change. This can result in removal and then
adding back node. This, with UDPU alive filtering (where member is by
default considered as not a member) makes corosync not sending messages
to such members resulting in new membership creation.
Solution is to properly test which members were really deleted and added
(instead of relying on internal and dynamic naming of icmap hash table
key name).
Also trully dynamic add and remove node (via cmap) is now handled by
same function so totem_config->interfaces is now updated properly.
Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>