Jan Friesse [Fri, 10 Jun 2016 16:02:53 +0000 (18:02 +0200)]
Qnetd: ffsplit: Enhance ffsplit
50:50 split algorithm now works in following way:
- On client configuration change, membership change or disconnect wait
till membership is stable (= all client configuration node list are
equal, and all partitions has equal information).
- Choose best partition >= 50%
- If no such partition exists, send NACK to all clients
- Send NACK to all clients who should receive NACK
- After all clients who should receive NACK confirm vote reception, send
ACK to all clients who should get ACK
This ensures that there are never two partitions with ACK and it has
much better behavior than previous version, because if tie-breaker
partition is not connected, other partition gets ACK.
Jan Friesse [Thu, 9 Jun 2016 15:42:55 +0000 (17:42 +0200)]
Qdevice: Send ring id in more messages
To prevent receiving vote from old membership ring id is sent to server
during init and replied back to client in every node list,
ask for vote reply and vote info messages.
Jan Friesse [Wed, 1 Jun 2016 13:07:34 +0000 (15:07 +0200)]
Qdevice: Correct API comments
Also after votequorum node list is received and qnetd is connected,
default vote is changed to WAIT_FOR_REPLY. This make much more sense
because it ensures qdevice doesn't vote with new ring id until qnetd
sends reply.
Jan Friesse [Tue, 31 May 2016 07:44:28 +0000 (09:44 +0200)]
qdevice: Ensure to exit if ipc socket is closed
When ipc socket was closed before poll and new connection got same fd as
original IPC socket, shutdown didn't work. Solution is to check if IPC
socket is active during poll array create.
Jan Friesse [Fri, 18 Mar 2016 14:58:40 +0000 (15:58 +0100)]
qnetd: Validate tie-breaker, algo and node dup
If new client request tie-breaker or algo which differes from rest of
cluster, error message is sent back. Also it's checked if node is not
duplicate by comparing node id.
Use the new timers to get better response from LMS when the network
splits, this also closes a race where the remote side could go inquorate
before we confirmed the vote.
Add client-side (qdevice-net) code to cope with a detached qnetd if we
are quorate and have wait_for_all enabled. THat situation will now
keep quorum.
Jan Friesse [Fri, 18 Mar 2016 09:40:58 +0000 (10:40 +0100)]
qdevice: Force send of heartbeat
Previously client was not force to use heartbeat. Because we have to be
able to detect dead client in qnetd, heartbeat setting is now forced.
Insted of set_option message, heartbeat is force to exists in init
message. This also means that
QDEVICE_NET_INSTANCE_STATE_WAITING_SET_OPTION_REPLY can be removed and
client is considered as connected after init_reply is received. So
currently, set_option is not used (but implementation of these two
messages still exists).
Jan Friesse [Thu, 17 Mar 2016 13:00:26 +0000 (14:00 +0100)]
qnetd: Add support for qnetd algo timer
Algo timer is simplified timer designed for qnetd algorithm. Instead of
full timer only one can exists per client. Workflow is:
- In one of algorithm callbacks qnetd_client_algo_timer_schedule is
called
- On timeout .timer_callback is called (for example
qnetd_algo_test_timer_callback). It's possible to set send_vote and
result_vote to send vote info to client
- It's possible to discard timer by calling
qnetd_client_algo_timer_abort
Timer is automatically deleted on client disconnect.
To make all this possible, qnetd main loop now has support for
timer-list (main_timer_list). To be able to handle error and disconnect
client from timer callback, client has schedule_disconnect. If this is
set to 1, client is disconnected on current call of poll loop.
Jan Friesse [Mon, 22 Feb 2016 15:46:45 +0000 (16:46 +0100)]
Refactor qdevice-net
- corosync-device-net as binary is gone. Replacement is
corosync-qdevice
- corosync-qdevice has support for multiple models (only net is
currently implemented)
- Completelly redesign qdevice-net main loop.
- Connect is non blocking
- Cmap and Votequorum events are handled even before connect to
qnetd. Algorithm gets send_node_list and vote set so it's not needed
to check connection status and also vote_timer is running and voting
until something changes (configuration or votequorum node list)
- If connect fails, algorithm_disconnected with new reason
CANT_CONNECT_TO_THE_SERVER is called
- Logging for qdevice is based on libqb logging functions. Also
logging configuration from corosync.conf is now used and dynamic
changes of configuration are handled.
- Added qdevice_net_algorithm_config_node_list_changed
- Changed qdevice_net_algorithm_votequorum_node_list_notify in respect
of adding send_node_list so it's similar to other functions.
qnetd: Use ring_id, not client->last_ring_id in algorithms
ring_id should only be copied into the client structure after the
algorithm has run (so the last one is also available), so fix the
algorithms to use the passed-in ring_id where available.
qnetd-algo: Fix list traversal corruption when freeing partitions.
TAILQ_* doesn't have a safe iterator for use when freeing entries, so the
only safe way of doing it (without assuming implementation) is to
restart the iterator after freeing the structure.
Jan Friesse [Mon, 1 Feb 2016 11:26:08 +0000 (12:26 +0100)]
Improve qdevice
- Add support for cmap node list configuration change
- Add client side algorithms
- Check if currently received ring id in membership message
equals to last sent ring id
- Send config node list only if config node list really changes and not
after every reload
- Add tlv_ring_id_eq (replacing qnetd_algo_rings_eq) so it's usable in
client
- Move debug logs from algo-test into qnetd-log-debug.c and call them in
proper places (= logs are now algorithm independent)
- Fix memory leak in msg
Move several commonly used routines into their own
qnetd-algo-utils.[ch] files and change over to using
the ring_id held in the client structure rather than
managint it ourself.
Jan Friesse [Fri, 6 Nov 2015 08:58:57 +0000 (09:58 +0100)]
qnet: Add TLV_VOTE_NO_CHANGE
State used for informative only callbacks (quorum node list) and
possibly informative only callbacks (configuration node list). Client
doesn't change cast vote timer state.
This patch tidies the two state change callbacks and explains them
in the man page:
The difference between votequorum_nodelist_notification_t and
votequorum_quorum_notification_t is subtle but important.
The 'nodelist' callback is sent at the start of a cluster state
transition and contains the new ring_id and only the list of
nodes that are included in the sync state - ie only active nodes. No
quorum information is included this callback because it is not
available at that time.
The 'quorum' callback is sent after the cluster state transition has
completed and does contain quorum information.
In addition, the nodelist contains a list of all nodes known to
votequorum (whether up or down) and their state as well
as information about the quorum device attached (if any). quorum
callbacks will not be sent for qdevice up and down
events unless they affect quorum.
votequorum: split callbacks into nodelist and quorum
This split is needed for qdevice, so that it gets the ring_id and
nodelist as part of the sync process and not afterwards - when quorum
has been calculated.
As this is and unsupported API I'm not too worried about breaking
existing code - all the clients I know of are using the quorum API
anyway as they should be.
Jan Friesse [Tue, 20 Oct 2015 14:10:12 +0000 (16:10 +0200)]
Improve qdevice-net
- Add cast vote timer (qdevice-net now really votes)
- In sync phase it's impossible retreive cmap config version so it's no
longer sent in membership node list
- Refactor qdevice-net