git.proxmox.com Git - mirror

]> git.proxmox.com Git - mirror_corosync.git/log

projects / mirror_corosync.git / log

commit | commitdiff | tree

Angus Salkeld [Wed, 4 May 2011 23:06:28 +0000 (09:06 +1000)]

autobuild: improve messages

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 4 May 2011 04:41:18 +0000 (14:41 +1000)]

CTS: add -l to keygen (normal keygen struggles to run on VMs)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 18 Apr 2011 02:46:53 +0000 (12:46 +1000)]

CTS: send with correct number of iovecs

Else payload won't be sent

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 18 Apr 2011 02:45:50 +0000 (12:45 +1000)]

CTS: timer should not be on the stack

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Jan Friesse [Wed, 4 May 2011 13:00:31 +0000 (15:00 +0200)]

totemsrp: Enhance mcast failure detection

memb_state_gather_enter increase stats.continuous_gather only if
previous state was gather also. This should happen only if multicast is
not working properly (local firewall in most cases) and not if many
nodes joins at one time.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 29 Mar 2011 13:51:42 +0000 (15:51 +0200)]

coroipcs: Deny connect to service without initfn

If library connect to service with no init function, coroipcs will try
to dereference NULL pointer. Now we correctly return error code
CS_ERR_NOT_EXIST.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Tim Serong [Fri, 15 Apr 2011 00:40:11 +0000 (10:40 +1000)]

Add ipc_refcnt to message_handler_req_{exec, lib}_cfg_ringreenable()

Without refcounting the conn pointer here, corosync will segfault
if one kills a running instance of "corosync-cfgtool -r" (rhbz#695191)

Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Mon, 3 Jan 2011 23:40:55 +0000 (16:40 -0700)]

Align ipc on 8 byte boundaries

Align all ipc messages on 8 byte boundaries. This alignment will remove bus
errors on systems that can't access non-byte aligned data and should improve
performance.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Steven Dake [Mon, 3 Jan 2011 23:40:54 +0000 (16:40 -0700)]

Fix problem where unaligned totemip address access would result in bus error on non-unaligned-safe architectures.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Greg Walton [Thu, 6 Jan 2011 16:15:24 +0000 (11:15 -0500)]

Clean up ENDIAN ifdef tests

Signed-off-by: Greg Walton <corosync@gwalton.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Tim Serong [Wed, 6 Apr 2011 11:30:46 +0000 (21:30 +1000)]

Fix tyop in RRP faulty error messages

Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Russell Bryant <russell@russellbryant.net>

commit | commitdiff | tree

Angus Salkeld [Tue, 12 Apr 2011 22:15:59 +0000 (08:15 +1000)]

IPC: place calls to stats functions outside of mutexes

This is to prevent nasty deadlocks between IPC and objdb.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Zane Bitter [Sun, 10 Apr 2011 13:04:17 +0000 (09:04 -0400)]

Provide better checking of the message type

A negative value for the message type (on systems where char is signed)
would cause a crash. This is highly probable if the cluster is, for example,
misconfigured to have encryption enabled on some nodes but not others.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Zane Bitter [Fri, 8 Apr 2011 03:48:49 +0000 (23:48 -0400)]

Fix uninitialised memory errors found by valgrind

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Tue, 29 Mar 2011 02:25:04 +0000 (13:25 +1100)]

Fix shutdown when a confdb client is still connected

If you are connected to corosync and registered for
object notifications then corosync is asked to shutdown
the IPC server will get stuck. This is because the pipe
is closed and the refcount is increased. This leaves ipcs
with a connection that it can't destroy.

Solution:
1) if a write to the pipe fails (pipe closed) decrement the refcounter.
2) fix the object_track_stop() - it was not working as the functions
   did not match up. (this caused the late callbacks).
3) in ipcs call exit_fn() then stats_destroy_connection() so that
   the service engine can have time to call object_track_stop()
   before the object gets destroyed.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 28 Mar 2011 22:41:04 +0000 (09:41 +1100)]

STATS: add the service name to the connection name.

This helps to quickly identify what service the application
is connected to.

The object will now look like:
runtime.connections.corosync-objctl:CONFDB:19654:13.service_id=11
runtime.connections.corosync-objctl:CONFDB:19654:13.client_pid=19654
etc...

This also makes it clearer to receivers of the dbus/snmp events
what is going on.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Sat, 26 Mar 2011 11:09:29 +0000 (22:09 +1100)]

NOTIFYD: prevent duplicate quorate events.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Sat, 26 Mar 2011 11:08:55 +0000 (22:08 +1100)]

NOTIFYD: fix retrieving the application's parent name.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 22 Mar 2011 16:33:03 +0000 (17:33 +0100)]

cfgtool: print list of IP with space between items

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 22 Mar 2011 16:32:45 +0000 (17:32 +0100)]

cpgtool: print list of IP with space between items

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 22 Mar 2011 16:20:12 +0000 (17:20 +0100)]

cfg_get_node_addrs: Return correct addresses

Zero element array behavior is very different from normal array or
pointer. This behavior is root of problem in not returning correctly
filled array of addresses. This appeared only in rrp mode, where more
then one address is returned.

All memcpy's are now correctly converted to copy pointer to char.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 24 Mar 2011 15:53:40 +0000 (08:53 -0700)]

totemsrp: free messages originated in recovery rather then rely on messages_free

Relying on messages_free may seem like it should work, but it leads to a
situation where every node has released the messages, yet some nodes think
messages are missing. The output then looks like "Retransmit: #" in
repitition. This patch frees those messages immediately during the transition
to the OPERATIONAL state and sets the internal variables totemsrp depends
upon to the proper values.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 24 Mar 2011 15:46:24 +0000 (08:46 -0700)]

totemsrp: Only restore old ring id information one time

The current code stores the current ring information every time a commit
token is generated. This causes the old ring id used for comparison purposes
to increase if a token is lost in commit or recovery, resulting in failure of
totem. This patch changes the behavior to only store the old ring id one
time when the commit token is received, and then further commit token ring
id saves are not done until OPERATIONAL is reached.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 24 Mar 2011 15:30:53 +0000 (08:30 -0700)]

totemsrp: Remove recv_flush code

The recv_flush code is no longer necessary because of the miss_count_count
addition. It can in some cases lead to register corruption because of
interactions with -fstack-protector, the recursive nature of how this code
works, and interactions with the optimizer in some versions of gcc.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 23 Mar 2011 20:54:42 +0000 (07:54 +1100)]

confdb: send notifications from the main thread not IPC thread

corosync-notifyd has exposed an issue with confdb notifications.

The normal state of affairs is:
IPC thread > lock > objdb > lock

objdb notification whilst really useful turn things around:
<middle of big call chain>
objdb > lock > confdb > ipc > lock

This reverse ordering of locks causes a horrible dead lock.

I see this patch as a work around until corosync-2.0
when most of the threads and locking disappear.

This patch adds a pipe to confdb service. When we get a
objdb notification a struct gets written to the pipe.
The poll loop then runs the dispatch in the main thread.
In the dispatch we call the real ipc_dispatch_send().

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Sat, 19 Mar 2011 01:47:10 +0000 (18:47 -0700)]

Resolve abort during simulatenous stopping of atleast 4 nodes

consider 5 nodes.

node 3,4 stopped (by random stopping) node 1,2,5 form new configuration
and during recovery node 1 and node 2 are stopped (via service service
corosync stop).  This causes 5 never to finish recovery within the timeout
period, triggering a token loss in recovery.  Bug #623176 resolved an assert
which happens because the full ring id was being restored.  The resolution
to Bug #623176 was to not restore the full ring id, and instead operate
(according to specifications) the new ring id.  Unfortunately this exposes
a problem whereby the restarting of nodes 1-4 generate the same ring id.
This ring id gets to the recovery failed node 5 which is now in gather,
and triggers a condition not accounted for in the original totem specification.

It appears later work from Dr. Agarwal's PHD dissertation considers this
scenario.  That solution entails rejecting the regular token in the above
condition.  Since the ring id is also used to make decisions for commit token
acceptance, we must also take care to reject the regular token in all cases
after transitioning from OPERATIONAL.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 21 Mar 2011 02:37:18 +0000 (13:37 +1100)]

notifyd: dispatch only one message at a time.

This is avoid getting stuck in the dispatch processing
messages when the user is trying to shutdown the service.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 14 Mar 2011 23:38:44 +0000 (10:38 +1100)]

Fix some "set but not used" warnings [-Wunused-but-set-variable]

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 14 Mar 2011 22:50:56 +0000 (09:50 +1100)]

Remove the ttl option from udpu and rely on the kernel ttl setting.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 14 Mar 2011 22:44:05 +0000 (09:44 +1100)]

Fix the ttl defaults and range

1) both IPv4 and IPv6 mcast should default to ttl=1
2) the range should be 0..255
0 is valid meaning localhost only (cluster of one)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit | commitdiff | tree

Russell Bryant [Sat, 12 Mar 2011 12:37:53 +0000 (06:37 -0600)]

Add Doxyfile to .gitignore

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Sat, 12 Mar 2011 08:39:04 +0000 (19:39 +1100)]

docs: auto-generate the version

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Sat, 12 Mar 2011 02:50:41 +0000 (20:50 -0600)]

Convert existing documentation to doxygen format.

This patch modifies most of the existing comments in header files to be
in a format that doxygen can interpret. This provides another
significant improvement to the web/pdf/etc generated documentation
without having to add new content.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Zane Bitter [Fri, 11 Mar 2011 03:30:35 +0000 (22:30 -0500)]

Allocate packet buffers in the transport drivers

This change paves the way for eliminating a copy within the Infiniband
driver in the future by transferring responsibility for allocating and
freeing message buffers to the transport driver layer.

Tested under valgrind on a single-node cluster.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Zane Bitter [Thu, 10 Mar 2011 04:55:06 +0000 (23:55 -0500)]

Fix minor errors in man page documentation for corosync.conf

* Correct 'See Also' reference to corosync.conf(5) in corosync(8) man page
* Update path to default config (now /etc/corosync/corosync.conf)

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Fri, 4 Mar 2011 19:55:54 +0000 (12:55 -0700)]

Fix abort when token is lost in RECOVERY state

A commit token should be rejected when a token is lost in the recovery
state.  This occurs naturally because the ring id increases by 4 for
every new ring.  Prior to this patch, if the token was lost, the old
ring id information was restored, causing a commit token to be accepted
when it should be rejected.  This erronously accepted commit token would
lead to an assertion which is fixed by this patch.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Russell Bryant [Mon, 7 Mar 2011 14:42:01 +0000 (08:42 -0600)]

Add content for the doxygen main page.

This creates some content on the main page of the documentation
generated by doxygen. The main page includes the license and a link
to the project web site.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
eviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Mon, 7 Mar 2011 14:39:58 +0000 (08:39 -0600)]

Resolve a couple of doxygen warnings.

This resolves a couple of doxygen warnings. First, the group needed a
name. Second, all of the functions in the file were added to the group
but doxygen complained about the lack of an end to the grouping.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Mon, 7 Mar 2011 14:38:53 +0000 (08:38 -0600)]

Update doxygen configuration file.

The included doxygen configuration file was a bit stale.  It included
some options that were obsolete and caused doxygen to generate some
warnings when running it.  Most of the changes here were simply done by
running "doxygen -u" to automatically update the file.  It added its
documentation for the options and removed the obsolete options.

This also includes one configuration change, which is to set EXTRACT_ALL
to yes.  This instructs doxygen to generate documentation pages for all
files, public functions, and public data structures even if they are not
currently documented using doxygen syntax.  Doxygen is capable of
generating some useful documentation on its own, such as dependency
graphs.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Mon, 7 Mar 2011 14:36:53 +0000 (08:36 -0600)]

Minor build system updates for doxygen.

The configure script has been updated to check for the doxygen and dot
applications (from doxygen and graphviz). The results from these checks
are now used in the Makefile to ensure that the tools are installed when
you run "make doxygen". If they are not, it will generate a helpful
error message.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Mon, 7 Mar 2011 14:30:03 +0000 (08:30 -0600)]

Ensure that strings are null terminated after strncpy().

From the strcpy(3) man page, the following warning is given:
  The strncpy() function is similar, except that at most n bytes of src
  are  copied.  Warning: If there is no null byte among the first n bytes
  of src, the string placed in dest will not be null-terminated.

The current corosync code base does not take this warning into account
when using strncpy, potentially resulting in non-null terminated strings.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Sat, 5 Mar 2011 16:02:25 +0000 (10:02 -0600)]

Add -l option to corosync-keygen.

This option (-l or --less-secure) causes corosync-keygen to read from
/dev/urandom instead of /dev/random to ensure that no input is required
from the user. It may be useful when this command is used from a
script.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Tue, 22 Feb 2011 19:48:32 +0000 (12:48 -0700)]

Don't assert when ring id file is less then 8 bytes

If the ring id file for the processor is less then 8 bytes, totemsrp would
assert. Our speculation is that this condition happens during a fencing
operation or local filesystem corruption.

With this patch, Corosync will create fresh ring id file data when the
incorrect number of bytes are read from the ring id.

Amend to use sizeof the strerror string length and PATH_MAX for the path length.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 24 Feb 2011 21:12:10 +0000 (14:12 -0700)]

snmp: Allow buildling of corosync on already existing older install of corosync

When building corosync against older libraries already installed on the system,
the corosync-notifyd application uses the wrong Makefile.am commands. This
results in the SNMPLIBS (which includes -L/usr/lib64) coming before the proper
LDADD flags. The result is an inability to compile on an already existing
installation.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Jan Friesse [Wed, 23 Feb 2011 14:15:49 +0000 (15:15 +0100)]

objdb: destroy all handles in _clear_object

Patch replaces free for object_instance with handle_destroy to remove
leaks in handles (and also memory leak).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 22 Feb 2011 11:19:48 +0000 (12:19 +0100)]

Iterate all items in object_reload_notification

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 22 Feb 2011 09:31:59 +0000 (10:31 +0100)]

corosync-fplay: use uint32_t and remove bit-shift

The flight recorder records all data in 32 bit words. Use uint32_t type
rather then unsigned int. Also remove bit-shift with multiply by sizeof
uint32_t.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 22 Feb 2011 09:30:07 +0000 (10:30 +0100)]

corosync-fplay: Use size_t length mod in printf

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Mon, 21 Feb 2011 17:24:53 +0000 (18:24 +0100)]

corosync-fplay: handle too large rec_size

Corrupted files may contain items with rec_size larger then g_record
buffer and/or flt_data_size.

Also g_record array size is now defined as constant.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Mon, 21 Feb 2011 12:23:46 +0000 (13:23 +0100)]

logsys: Properly lock flt data before dump

Data needs to be locked, otherwise resulting fdata file may be
incorrect.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Mon, 21 Feb 2011 12:14:21 +0000 (13:14 +0100)]

logsys: Don't leak fd on successful fdata dump

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Russell Bryant [Tue, 15 Feb 2011 02:51:29 +0000 (20:51 -0600)]

Add calls to pthread_attr_destroy().

This patch adds a couple of missing calls to pthread_attr_destroy().

There were a couple of instances where pthread_attr_init() was being
used without a cooresponding call to pthread_attr_destroy(). This also
localizes the pthread_attr_t to the function where it is needed instead
of having it persist (the man page specifically states that destroying
the attributes structure has no effect on threads created using the
attributes).

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 14 Feb 2011 02:40:17 +0000 (13:40 +1100)]

CTS: wait (consistently) for 15 minutes for events

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Sun, 13 Feb 2011 21:13:36 +0000 (08:13 +1100)]

autobuild: clean the build dir first.

This deletes files like .version that cause problems.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Fri, 11 Feb 2011 05:57:49 +0000 (16:57 +1100)]

CTS: temp remove troublesome tests.

Right I know - not so good to comment out tests.
BUT they are passing but there is some weirdness
in ssh reconnecting to these nodes that causes CTS false
negatives.
So the nodes are watchdogged (as expected) but when they come
back up cts gets stuck in a loop re-trying to ssh into
them. It odd as a manual ssh works fine.

Basically I think it's more important the we get reliable
testing than have these test in there.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Sat, 5 Feb 2011 08:38:04 +0000 (19:38 +1100)]

Make node state a string (not an integer)

Ryan noticed this inconsistency, all other status's
are string so this should be too.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Seven Dake <sdake@redhat.com>
Reviewed-by: Ryan O'Hara <rohara@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 12 Jan 2011 02:28:12 +0000 (13:28 +1100)]

CONFDB: fix parent_get response id

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Seven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Thu, 3 Feb 2011 23:46:00 +0000 (10:46 +1100)]

MIB: expand the descriptions of the notifications

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Lon Hohberger [Fri, 28 Jan 2011 23:56:13 +0000 (18:56 -0500)]

Match up MIB to notifyd & add SNMP quorum events

Signed-off-by: Lon Hohberger <lhh@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Lon Hohberger [Fri, 28 Jan 2011 23:55:34 +0000 (18:55 -0500)]

Make SNMP MIB match what is being sent over DBUS

Signed-off-by: Lon Hohberger <lhh@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 12 Jan 2011 09:40:00 +0000 (20:40 +1100)]

Add dbus and snmp notifier

This is to send dbus events on major cluster events:
- membership changes
- application connect/dissconnet from corosync
- quorum changes

dbus events can then be converted into snmp traps by foghorn or
corosync-notifyd can be run to directly send snmp traps.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Signed-off-by: Lon Hohberger <lhh@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 12 Jan 2011 02:27:35 +0000 (13:27 +1100)]

CONFDB: add confdb_object_name_get()

This is useful when tracking object changes.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Seven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 12 Jan 2011 02:26:05 +0000 (13:26 +1100)]

STATS: fix key name length on "join_count"

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Seven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 12 Jan 2011 02:25:31 +0000 (13:25 +1100)]

STATS: increase the space for application names

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Seven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Fri, 28 Jan 2011 10:00:20 +0000 (11:00 +0100)]

Handle "nocluster" kernel parameter in init script

Init script checks kernel parameters and refuses to start corosync if
nocluster parameter exist on boot time. The init script will
continue to work as expected from console/tty after boot.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Mon, 10 Jan 2011 13:40:27 +0000 (14:40 +0100)]

Add objdb firewall_enabled_or_nic_failure

New objdb var runtime.totem.pg.mrp.srp.firewall_enabled_or_nic_failure
is set to 1 if continuous_gather is larger then MAX_NO_CONT_GATHER.
Under normal conditions, value of variable is 0.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 10 Jan 2011 23:59:02 +0000 (10:59 +1100)]

Add missing entries into .gitignore

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 10 Jan 2011 23:56:24 +0000 (10:56 +1100)]

remove unused function declaration

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Mon, 10 Jan 2011 23:55:56 +0000 (10:55 +1100)]

fix timersub warning on freebsd

Make them all protected by #ifndef timersub

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Mon, 10 Jan 2011 17:33:34 +0000 (10:33 -0700)]

Handle delayed multicast packets that occur with switches

Some switches delay multicast packets vs the unicast token. This patch works
around that problem by providing a new tuneable called miss_count_const. This
tuneable works by counting the number of times a message is found missing
and once reaching the const value, marks it as missing in the retransmit list.

This improves performance and doesn't display warning messages about missed
multicast messages when operating in these switching environments.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 22 Dec 2010 23:30:11 +0000 (10:30 +1100)]

CPG: make sure coroipcc_service_disconnect() is always called.

This prevents a shared mem leak if corosync dies while clients
are connected.

Calling cpg_finalize() did not release the shared mem as
coroipcc_msg_send_reply_receive() returned an error and
thus coroipcc_service_disconnect() did not get called.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 22 Dec 2010 03:02:40 +0000 (14:02 +1100)]

IPC: send failure message to client if memory maps fail

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Jan Friesse [Thu, 2 Dec 2010 13:35:00 +0000 (14:35 +0100)]

Display warning when not possible to form cluster

This may typically happen if local firewall is enabled. Patch adds new
item to statistics called continuous_gather where is number of
continuous entered gather state. If this number is bigger then
MAX_NO_CONT_GATHER, warning message is displayed. This is also used on
exiting, so stop of corosync is now possible even with enabled firewall.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Fabio M. Di Nitto [Wed, 1 Dec 2010 18:31:44 +0000 (19:31 +0100)]

build: fix make srpm from release tarball

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Fabio M. Di Nitto [Wed, 1 Dec 2010 18:31:45 +0000 (19:31 +0100)]

build: fix rpm build to include corosync-blackbox

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Wed, 1 Dec 2010 18:18:19 +0000 (11:18 -0700)]

Revert "Always autogen the tree when building an RPM"

This reverts commit d145838a21fb636461a2bceeada34db439f4a9ec.

commit | commitdiff | tree

Steven Dake [Wed, 1 Dec 2010 17:27:06 +0000 (10:27 -0700)]

Always autogen the tree when building an RPM

Since the source tarball never includes the autogen'ed tree in the new source
repo methodology, always autogen the tree.

Signed-off-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Wed, 1 Dec 2010 16:27:14 +0000 (09:27 -0700)]

Set the max buffer size for sockets

Set the recv buffer to a large size and the send buffer to a large size to
allow the kernel to store more messages before dropping messages.

Amended to change optlen type to socklen_t

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit | commitdiff | tree

Steven Dake [Sun, 28 Nov 2010 08:45:08 +0000 (01:45 -0700)]

The flushing code was introducing data corruption because of recursion errors
that occur as a result of the design of udpu. Totem no longer requires
the flushing technique because we don't mark a packet as missing until it has
not been seen by a certain number of token rotations per a previous patch. This
mechanism was introduced to work around a problem in switches where multicast
messages may be delayed by long periods compared to the unicast token.

This patch removes the flushing logic from udpu since it is no longer necessary.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Wed, 24 Nov 2010 03:35:56 +0000 (14:35 +1100)]

Add totem/interface/ttl config option.

This adds a per-interface config option to
adjust the TTL.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Fabio M. Di Nitto [Fri, 19 Nov 2010 08:21:47 +0000 (09:21 +0100)]

build: fix makefile to ship corosync.conf.example.udpu

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 18 Nov 2010 22:03:19 +0000 (15:03 -0700)]

Merge branch 'topic-udpu'

Conflicts:
Makefile.am

Signed-off-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 18 Nov 2010 21:51:17 +0000 (14:51 -0700)]

Remove dead soresueaddr code

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 18 Nov 2010 16:31:49 +0000 (09:31 -0700)]

Add the UDPU transport

The UDPU transport is useful for those deployments which can't use multicast.
UDPU works by using UDP unicast, which is fully supported by every switch
manufacturer by default and doesn't rely on a functional IGMP implementation.

An example of the UDPU transport is contained in the corosync.conf.example.udpu
file which shows a 16 node cluster. This file should be copied to each node
in the cluster and IP addresses changed as appropriate.

Amended to remove dead udpu REUSEADDR socket option.

Signed-off-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Fabio M. Di Nitto [Wed, 10 Nov 2010 16:31:36 +0000 (17:31 +0100)]

build: fix spec file and srpm/rpm generation

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Fabio M. Di Nitto [Wed, 10 Nov 2010 14:36:31 +0000 (15:36 +0100)]

add release script and git based versioning

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Wed, 10 Nov 2010 14:08:54 +0000 (07:08 -0700)]

Merge branch 'master', remote branch 'origin/master'

commit | commitdiff | tree

Steven Dake [Wed, 10 Nov 2010 04:49:58 +0000 (21:49 -0700)]

Add license information to LICENSE file about build process files

A few files licensed under GPLv3+ produce text output but are not used as
part of the runtime or libraries provided by Corosync. Make that notification
in the LICENSE file.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Thu, 21 Oct 2010 23:29:31 +0000 (10:29 +1100)]

Add -i <num-iterations> to cpgverify

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

commit | commitdiff | tree

Steven Dake [Thu, 21 Oct 2010 22:44:00 +0000 (15:44 -0700)]

New topic descriptions based upon work community wants to do

This file describes the topics of interest for development, their start and
finish date, their main developer, and a description of the topic.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Angus Salkeld [Thu, 21 Oct 2010 09:31:17 +0000 (20:31 +1100)]

Add .gitignore files.

Otherwise "git status" is a pain.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@edhat.com>

commit | commitdiff | tree

Steven Dake [Wed, 20 Oct 2010 21:16:56 +0000 (14:16 -0700)]

Add -n option to corosync-objctl to create a new object/key combo

Find an existing parent object and add the last object/key name of the command
to the object database. This allows the runtime addition of ip addresses to
the list of IPs corosync knows about for the purpose of the UDPU transport mode.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

commit | commitdiff | tree

Jan Friesse [Tue, 12 Oct 2010 13:03:37 +0000 (13:03 +0000)]

Remove delay in library on corosync shutdown

Patch removes 2 seconds delay in library on normal corosync shutdown.
Delay is still present on abnormal shutdown.

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3059 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Tue, 28 Sep 2010 23:42:57 +0000 (23:42 +0000)]

autobuild: fix the continous build

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3058 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Mon, 27 Sep 2010 22:41:26 +0000 (22:41 +0000)]

Check for a properly configured multicast address.

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3057 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Mon, 27 Sep 2010 21:14:59 +0000 (21:14 +0000)]

CTS: add sam/wd integration tests.

- fix send_dynamic() exception
- fix basic sam integration test
- fixup calls to sam tests
- fix startup when using testquorum (currently only handles votequorum)
- improve SAM test case with better checking.

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3056 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Mon, 27 Sep 2010 21:14:06 +0000 (21:14 +0000)]

AUG: add support for resources section & quorum/quorate

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3055 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Mon, 27 Sep 2010 21:13:15 +0000 (21:13 +0000)]

WD/SAM integration.

- timestamps -> uint64_t and in nanosecs
- use clock_gettime
- common object naming
- common state names
- timeouts in milliseconds

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3054 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Mon, 27 Sep 2010 21:12:03 +0000 (21:12 +0000)]

Add monitoring and watchdog services.

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3053 fd59a12c-fef9-0310-b244-a6a79926bd2f

commit | commitdiff | tree

Angus Salkeld [Mon, 27 Sep 2010 21:11:04 +0000 (21:11 +0000)]

Add a Finite State Machine.(fsm.h)

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3052 fd59a12c-fef9-0310-b244-a6a79926bd2f

corosync mirror

RSS Atom