git.proxmox.com Git - mirror

votequorum: make master_wins check stricter

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: add ENTER/LEAVE for consistency

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: fix library checks on qdevice name and readd qdevice_update

for some odd reasons qdevice_update was just gone.. totally...

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: delegate qdevice_master_wins setting to qdevice

votequorum has no business to device if master_wins setting is correct or not.
only the qdevice can decide and should set the value for votequorum.

Logic is:

- user requests master_wins from config
- corosync starts
- qdevice starts
- qdevice reads cmap values / register with votequorum
- qdevice decides if the node can support master_wins or not and tells votequorum
- at this point votequorum can check if an unquorate node is part of the master_wins
partition

it is the qdevice responsibility to keep that value up to date in votequorum and the
value can be changed at runtime.

this commit also exchange per node master_wins information to lay down the infrastructure
to verify discrepancies in node config for master_wins (coming next on this channel).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: drop votequorum_qdevice_getinfo and collapse data into getinfo

it's really pointless to have basically a duplicated API call
to transfer one value and one name.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: external defines should all be prefixed with VOTEQUORUM_

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: drop _FLAG_ from defines

those are all info flags.. it's redudant and inconsistent

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: fix define name to match reality

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

qdevice: implement master_wins partition

in previous incarnation of qdisk + cman, master_wins was restricted
to 2 node only.

In this new version it is possible to use master_wins for any cluster
size.

Let's assume a 4 node cluster. Each node votes 1, qdevice votes 3.

node 1 becomes qdevice master
node 2/3/4 no

In case of a split (let's assume 2/2):

partition 1: {4, 1}
partition 2: {1, 1}

node 2 in partition 1 would normally be unquorate, leaving effectively
only node 1 active.

master_wins allows node 2 to recognize to be part of a quorate partition
(since node1 is broadcasting that qdevice is voting) and retain
quorum.

node1 has never lost quorate status since qdevice is voting there.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: fix flag check for qdevice votes propagation

and cleanup similar code to make it more readable

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: remove last instance of state and rename it to cast_vote

also align naming of vote to cast_vote for info calls

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: several major bug fixes and code cleanup

- add a protection check to avoid spurious messages on membership
  change
- greately simplify processing of nodeinfo, since the only
  data that we send for qdevice over nodeinfo is the number of votes
- fix a flag check to trigger quorum calculation that would
  leave a cluster unquorate under certain conditions

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: move to the new flag structure

simplify different code path as checks are simpler, separate
ALIVE and CAST_VOTE

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: simplify getinfo data and protect against call against quorum node

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: use REGISTERED flag consistently

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: simply internal qdevice_getinfo function

as data are moving around we can drop lots of special cases

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: add qdevice CAST_VOTE status/flag

this is a preparation commit for the next changes. right now it is
no more than an alias to ALIVE.

CAST_VOTE is required to support master/slave feature from qdevice.

Effectively a quorum device can be:

Not registered / registered (connected to API but nothing else is happening)

if registered:

Not alive / alive (quorum device is petting the API via poll and timer is running)

if alive:

Not voting (slave) / voting (master)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: rename NODE_FLAGS_QDEVICE_STATE to NODE_FLAGS_QDEVICE_ALIVE

STATE is confusing and overloaded term in votequorum as it's used for nodes
and other bits.

make the name unique and ALIVE means that the qdevice is heartbeating
to votequorum.

improve display of the status in tools and tests.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: rename NODE_FLAGS_QDEVICE to NODE_FLAGS_QDEVICE_REGISTERED

make the flag name explicit

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

votequorum: re-enable qdevice api

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

quorum devices: add support to build system

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Add groff as a BuildRequires to spec file

According to Fedora packaging guidelines, groff is not on the list
of package exceptions for BuildRequires. A recent change in the Fedora
build system has triggered breakage in building rpm packages and it
is likely this package won't build for Fedora 18.

Reference:
http://fedoraproject.org/wiki/Packaging:Guidelines#Exceptions_2

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>

Don't call sync_* funcs for unloaded services

When service is unloaded, sync shouldn't call sync_init|process|activate
and abort functions. It happens very rare, but in process of unloading
all services, totem can recreate membership and bad things can happen
(service is unloaded, so there may be access to already freed memory,
...)

Solution is to fetch services sync handlers in every time when we are
building service list instead of using precreated one.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

Introduce SERVICES_COUNT_MAX macro

Sync/service was using maximal number of services in ehter numberic form
(magic constant) or inconsistently, this means using
SERVICE_HANDLER_MAXIMUM_COUNT which means maximal number of handlers.

New macro solves this.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

cmap_keys: Document few more runtime statistics

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

cts: Delete shm blacbox after corosync kill

This makes SHM Audit pass in test CpgCfgChgOnExecCrash.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

cpg: Be more verbose for procjoin message

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

cts: Change DC_IDLE pattern

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

cts: Make shm_leak_audit run

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Correctly free state string in wd

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

cts: Change local_start[ing|ed] pattern in CTS

Previous pattern is no longer send to syslog. Use first pattern which
is.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Support for crypto_ and nodelist in lenses

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Revert "Free state variable allocated in wd_resource_state_is_ok"

This reverts commit 01c63ca17ca5b5a780cff0dd96d7d432b3e980a3.

cpg: Enhance downlist selection algorithm

Let's say we have 2 nodes:
- node 2 is paused
- node 1 create membership (one node)
- node 2 is unpaused

Result is that node 1 downlist is selected, so it means that from node 2
point of view, node 1 was never down.

Patch solves situation by adding additional check for largest previous
membership.

So current tests are:
1) largest (previous #nodes - #nodes know to have left)
2) (then) largest previous membership
3) (and last as a tie-breaker) node with smallest nodeid

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

cpg: Print cpg name to debug informations

In downlist and joinlist debug output group was printed in nonsense
format of integer to pointer to array.

Now it's printed by full name.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

cpg: Process join list after downlists

let's say following situation will happen:
- we have 3 nodes
- on wire messages looks like D1,J1,D2,J2,D3,J3 (D is downlist, J is
joinlist)
- let's say, D1 and D3 contains node 2
- it means that J2 is applied, but right after that, D1 (or D3) is
applied what means, node 2 is again considered down

It's solved by collecting joinlists and apply them after downlist, so
order is:
- apply best matching downlist
- apply all joinlists

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

cpg: Never choose downlist with localnode

Test scenario is follows:
- node 1, node 2
- node 1 is paused
- node 2 sees node 1 dead
- node 1 unpaused
- node 1 and 2 both choose same dowlist message which includes node 2 ->
node 2 is efectivelly disconnected

Patch includes additional test if left_node is localnode. If so, such
downlist is ignored.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

When flushing, discard only memb_join messages

Patch solves problem when 1 ring out of 2 went up/down quite often.

The simplest setup to reproduce bug is following:
- 2 VMs, connected by 2 network interfaces
- OS: Linux
- On one of the VMs, a test program sending some CPG messages (see the
  script "test_corosync.sh" joined to this mail for example)

Here are the Corosync logs we get when we do this setup:

Jun 06 16:23:40 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 06 16:23:40 corosync [CPG   ] chosen downlist: sender r(0)
ip(192.168.56.104) r(1) ip(192.168.57.104) ; members(old:1 left:0)
Jun 06 16:23:40 corosync [MAIN  ] Completed service synchronization,
ready to provide service.
Jun 06 16:24:37 corosync [TOTEM ] Marking ringid 1 interface
192.168.57.105 FAULTY
Jun 06 16:24:38 corosync [TOTEM ] Automatically recovered ring 1
Jun 06 16:25:33 corosync [TOTEM ] Marking ringid 1 interface
192.168.57.105 FAULTY
Jun 06 16:25:34 corosync [TOTEM ] Automatically recovered ring 1
Jun 06 16:26:35 corosync [TOTEM ] Marking ringid 1 interface
192.168.57.105 FAULTY
Jun 06 16:26:36 corosync [TOTEM ] Automatically recovered ring 1
(...)

The second ring goes down about every 2 minutes and automatically back
up right after.

We spent some times looking for the commit that introduced this bug, and
it appears it's due the following one:
Corosync 1.3.3 -> 1.3.4: e27a58d93d0d3795beb550f87b660c9c04f11386
Corosync 1.4.1 -> 1.4.2: be608c050247e5f9c8266b8a0f9803cc0a3dc881
Commit message: Ignore memb_join messages during flush operations

I had a look at this commit, and it seems to me it's dropping too many
packets:
Because of this commit, while totemrrp_recv_flush() is called, Corosync
drops memb_join packets, but also ORF tokens. In the end, it seems that
sometimes, we drop so many of them that Corosync marks the ring as
faulty.

To fix that, only memb_join messages are dropped now.

Signed-off-by: Jerome FLESCH <jerome.flesch@netasq.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Store fdata with timestamp and pid in name

This should allow easier handling of various blackbox dumps. Original
fdata name is now symlink to latest created dump.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Remove corosync-fplay

Libqb now ships with qb-blackbox command doing same job as
corosync-fplay. It doesn't make sense to maintain two versions of same
utility so corosync-fplay can go. corosync-blackbox command now calls
directly qb-blackbox.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

notifyd: handle addition of a members key to CMAP

When new key (totem.pg.mrp.srp.members) was added to CMAP,
we would like to receive the trap of this time.

Signed-off-by: Kazunori INOUE <inouekazu@intellilink.co.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

testcpg: fix build warning

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

totemudpu: Bind sending sockets to bindto address

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

snmp MIB: Remove unnecessary comma

Thank Hideo Yamauch for pointing this bug.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>

notifyd snmp: fix a function name

This fixes the bug to which snmp trap of rrp_faulty_event is not sent.

Signed-off-by: Kazunori INOUE <inouekazu@intellilink.co.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

rename mainconfig to logconfig

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

mainconfig: allow mainconfig logic to be used both internally and externally

corosync logging configuration logic is rather complex and in order
to make it simpler to reuse (at least within corosync/ tree)
we need to be able to use both icmap and cmap.

the patch might seem controversial, but it reduces heaps of code around
from qdevices (coming next).

It might be useful to consider moving this to a common shared library
but there aren't enough users yet and a shared lib would force
corosync to link with cmap (that we do not want at all costs)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

LOG: make sure the log target is enabled.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

LOG: handle closing unused logfiles better

This fixes a bug where having a second log file will close
the previous one.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

LOG: be more explict about the qb file names

else we can get messages been put in the wrong subsys.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

LOG: drop the number of logging subsystems from 64 to 32

Currently 14 are used, 64 seems like a waste of memory.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Correct the description of bindnetaddr config parameter in manpage

bindnetaddr has been wrongly described in the past, and did not
document that fact that it will also accept exact address matches.

Signed-off-by: Barney Desmond <barney.desmond@anchor.net.au>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

totemip: Support bind to exact address

Logic for binding now works in following way:
- Try to find exact match
- If not exact match is found, use first found network address

This allows set concrete IP even if network settings contains two IPs on
same network.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

totemip: insert items in correct order

list_add_tail is used instead of list_add so ip addresses are inserted
in same order as returned by getifaddrs.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

init: major cleanup

- rename generic.in and notifyd.in to corosync.in and corosync-notifyd.in
  (makes build simpler)
- fix sysvinit corosync.in sleep time to include a check for when IPC
  are ready and drop cman bits (there is no cman with corosync 2.0)
- corosync-notifyd.service should always start after corosync.service
- corosync.service should always start after network
- corosync.service uses init script wrapper
- install/ship sysvinit as wrappers for systemd in /usr/share/corosync
  when necessary
- change the build system to deal with all of the above

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Include ringid in processor joined log message

This should help correlate syslog entires with their blackbox
counterparts.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Andrew Beekhof <andrew@beekhof.net>

Update TODO file

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

Improve testcpg to handle change of node identity

Signed-off-by: Dan Clark <2clarkd@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

icmap: don't leak memory when changing ro/rw status on a key

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

icmap: fix a valgrind errors (pass 1)

clean up a lot of allocated blocks at exit.
those changes has no runtime effects, but it makes valgrind
output a bit more useful by dropping over 700 errors/warnings to skip
over every single run.

there are still a few icmap related valgrind errors but those need
some more complex and timeconsuming investigation.

pre patch:

==21844== HEAP SUMMARY:
==21844==     in use at exit: 1,229,321 bytes in 1,516 blocks
==21844==   total heap usage: 7,191 allocs, 5,675 frees, 3,819,853 bytes allocated

==21844== LEAK SUMMARY:
==21844==    definitely lost: 3,617 bytes in 11 blocks
==21844==    indirectly lost: 21,960 bytes in 11 blocks
==21844==      possibly lost: 1,080,101 bytes in 131 blocks
==21844==    still reachable: 123,643 bytes in 1,363 blocks
==21844==         suppressed: 0 bytes in 0 blocks

==21844== ERROR SUMMARY: 136 errors from 136 contexts (suppressed: 0 from 0)

post patch:

==25793== HEAP SUMMARY:
==25793==     in use at exit: 1,185,870 bytes in 808 blocks
==25793==   total heap usage: 9,427 allocs, 8,619 frees, 4,156,841 bytes allocated

==25793== LEAK SUMMARY:
==25793==    definitely lost: 3,697 bytes in 12 blocks
==25793==    indirectly lost: 22,248 bytes in 13 blocks
==25793==      possibly lost: 1,079,655 bytes in 113 blocks
==25793==    still reachable: 80,270 bytes in 670 blocks
==25793==         suppressed: 0 bytes in 0 blocks

==25793== ERROR SUMMARY: 119 errors from 119 contexts (suppressed: 0 from 0)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

crypto init: release *_slot resource after init

Those are only used at init phase and we can free some memory for the system.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

ipcs: allow connections only after all services are ready

this fixes a rather annoying race condition at startup where a client
connects to corosync "too fast" before the service is ready to operate
and client gets some random data during initialization phase.

With this fix, we allow connections to ipc only after the main engine
is operational and configured (and after the first totem transition).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>

Always allocate totemrrp stats array

This prevents segfault when rrp mode is set with only one ring.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Properly parse uidgid files

Full path to key is now tested rather then key name only.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

build: improve systemd service file handling

this solves the issue of having to special case before and after usrmove

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Remove info pages see also from cmapctl man

Corosync doesn't have documentation in info format, so information is
corosync-cmapctl was not true.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

Add man page with CMAP keys created by corosync

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

Check before making a reference to __start___verbose

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

conf: add quorum section to example config

document only the provider option since all the others
(votes/expected_votes/etc) are provider specific.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Only call qb_ipcc_disconnect when the instance is fully dereferenced.

Sometimes calling xyz_finilize() within a dispatch would
cause a crash because the qb_ipcc_disconnect actually
disconnects immediatly and frees it't memory. whereas
the corosync structure is reference counted. So this
makes use of the reference counting to only call
qb_ipcc_disconnect when it is fully dereferenced.

Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>

Convert udpu example to use nodelist

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

totemcrypt: fix build warning (unused variable)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

totemcrypto: major code cleanup (no functional or onwire changes)

- cleanup include list
- reorder code and functions (crypto then hash)
- split crypt/decrypt/hash functions
- some micro optimizations by dropping a few memcpy
- make the code more readable (better var names and buffers mapping)
- improve exit paths on error (return codes and free)
- store crypto header size instead of recalculating it per packet

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Make ifaces_get work with dynamic no_rings

Commit which added number of addresses to srp_address structure didn't
count with totemsrp_ifaces_get where whole structure was copied instead
of addresses only. This is now fixed.

Also to make API totempg forward compatible, size of interfaces array
must be passed to ifaces_get like functions to prevent memory overwrite.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Add no_addrs field in srp_addr structure

This should allow us future change to dynamic number of rings without
breaking wire compatibility.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

Reflect config changes for crypto in examples

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Mark few more icmap keys as read only

Also most of the key settings are now centralized in one function, so
it's easier to audit.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Make common_lib version independand on totem_pg

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

crypto: Remove sha224 and add md5 hash

SHA224 is not supported on RHEL6 and also it's kind of weird. Instead of
that, md5 can now be configured.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Update crypto_set API

Also few leftovers from cfg is removed and version of totempg is
increased to 5 to reflect all changes we made

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

crypto: allocate padding in crypto_header

while it might seem a waste of space by using 2 extra bytes in
the crypto_config_header, it actually gives us the option
to grow "unknown at this time" features without hopefully
breaking onwire compat

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

crypto: add new hashing methods and fix config defaults

add support for sha224/256/384/512

change config defaults to match coroparse and totemconfig

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Document crypto_hash and crypto_cipher options

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

crypto: change network packets and add dynamic crypto header/data

The new network packet will look:

struct crypto_config_header * that provides info on crypto/hashing
hash_block[size based on hashing function] (if hash is selected)
salt[SALT_SIZE] (if crypto is selected)
...data...

and we kill the concept of crypto_security_header completely since
values are now dynamic for hash_block_size.

the reason why hash_block needs to be there, is because we do
hash salt in case both hashing and crypto are selected.

the crypto_config_header is totally transparent to totem
and to any underlaying crypto functions.

as we go cleaning, also use HASH_BLOCK_SIZE to generate hash_block.
the input buffer and output buffer size are dependent on the algo
used to hash.

we can now determine the real header size and adjust net_mtu properly
at startup. This will allow in future to use any algorithm since
size is dynamic.

some part of the code still needs some polishing to make it more
readable (specially the mapping of pointers into the packet
is still a bit obscure).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

totem: don't send garbage onwire if we fail to crypt

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

crypto: add crypto config to network data

this add 2 bytes at the end of the each packet to propagate
config info.

in case there is a config mismatch packet must be rejected.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

crypto: drop secauth and make crypto none work again

keep totem.secauth config key for compatibility

if the key is NOT set, crypto will default to aes256/sha1
if the key is set to "off", crypto is disabled.
this reflects pretty much old behavior

keywords totem.crypto_cipher and totem.crypto_hash can
override secauth individually.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Parse and use hash and crypto from config file

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Rename totemcrypto

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

crypto: mask the crypto operations from totem packet size management

totem doesn't need to understand what crypto does.

totem needs to be able to tell crypto: "those are data, play with them"
and crypto needs to return: "here are your scrambled data and the new size"

similar to decrypt/verify.

this way we add enough dynamic within crypto to change header size and all
at any given time (for different hash algorithm for example) without
affecting on wire compat.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

onecrypt: move encryption code to crypto.c

This will remove duplicity of code.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

cfg: remove crypto_set

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

corosync-cfgtool: Remove set of cryptography

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Remove libtomcrypt

Tomcrypt in corosync is for long time not updated. Because we have
support for libnss, libtomcrypt can be removed.

Also few leftovers (AES is 256 bits, not 128, ...) are removed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

drop evs service

there are several reasons for this:

1) evs is only partially implemented with no plans to complete it

typedef enum {
       EVS_TYPE_UNORDERED, /* not implemented */
       EVS_TYPE_FIFO,          /* same as agreed */
       EVS_TYPE_AGREED,
       EVS_TYPE_SAFE           /* not implemented */
} evs_guarantee_t;

2) evs has no users in any upstream distribution and no search
   engine can find any other upstream using it.

3) the only reason (I was told) to carry around evs was that evs
   receives the full ring_id struct from totem. This is only
   partially correct because while the structures are prepared
   to carry around those data, they are never transmitted from
   corosync engine down the IPC line to the user.
   CPG ring_id contains the exact same information and it's
   actually less buggy (due to prototying of the info).

worst case scenario where a user really absolutely need libevs,
it can be easily reimplemented as libcpg wrapper and avoid
lots of code duplication.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

build: drop another leftover from the past

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

build: drop obsoleted SOCKETDIR option

yet another leftover from the past that can go away

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

build: drop last LCRSO references

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

pload: make it a test service and not a public one

pload is a performance benchmark that measures the onwire
speed of corosync.

problem is that once pload has been executed, the cluster
is basically dead.

turn pload into a test tool, by removing corosync-pload tool
and user library.

cleanup pload code to make it more readable and drop lots
of unnecessary stuff.

add test/ploadstart tool that can configure and start pload
via cmap calls.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>

totem: drop crypt_accept: concept/option

this was another old onwire compat mode that is not useful anylonger.

we can safely move the new model by default.

According to Honza (real hardware 1 node testing) there are no
performance impact.

My tests (8 nodes VM cluster), there is up to 10/12% performance
improvements up to 1M packet size where old and new models are equal.

As a side note, nss still shows to be a performance loss on both
real and virtual hw (without any kind of nss hw acceleration).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>