Thomas Lamprecht [Mon, 12 Sep 2016 15:50:54 +0000 (17:50 +0200)]
pmxcfs: increase max filesize from 128k to 512k
This fixes bug 1014 and also fixes a few other problems where user
ran into the file size limitation, I did not found the bug entries
for them, but they covered:
1) there was a maximum of about <1500 services which could be
managed by our HA manager, as after that the manager_status file
got to big
2) firewall rules may also reach this limit on a bigger setup
I tested this with concurrent started read/writes of random data
files from and into RAM (tmpfs mounts), as long as we do not flush
often and read everything at once (i.e. write/read with a big block
size) the performance stays good.
The limiting factor in speed is not corosyncs CPG but sqlite, that
can be seen when comparing worst case scenarios between local pmxcfs
and clustered pmxcfs instances and simple debug logging.
We optimize our sqlite usage quite heavy, relevant additional speed
gains cannot be made without loosing reliability, as far as I've
seen.
So I only got into problems if I read/wrote small blocks
with a few hundred big writes started at once, e.g.
for i in {1..100}
do
dd if=/tmp/random512k.data of="/etc/pve/data$i" bs=1k &
done
As with the above worst case each block gets written as a single
transaction to the database, where each transaction has to be locked
and synced to disk for reliability.
So packing all changes (i.e. the whole file) into one DB transaction
does not produces much overhead of 512k files compared to 128k files
As data written through the PVE framework is written and read in
such a way we can increase this without seeing much of a
performance impact.
It should be also noted that just because files can now get bigger
not a lot will get that. Rather there may be just one to three files
bigger than 128k on some setups.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
use g_return_val_fail as cfs_loop_stop_worker returns void
do not use g_return_val_if_fail because the cfs_loop_stop_worker
function does not return anything and newer versions of GCC complain
about that (I used gcc version 5.4.0 20160609 (Debian 5.4.0-6 from
stretch)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 30 Jun 2016 14:35:36 +0000 (16:35 +0200)]
ensure quorum is set false when corosync fails
If corosync directly fails (i.e. `killall corosync`) the local node
acted like it had still quorum, which is not ideal.
Ensure that we set quorate to false before we finalize the quorum.
Do this in:
* service_quorum_dispatch, if it fails it is important that we set
it to false, as there is a good possibility that the
quorum_notification_fn won't get called anymore, reproducible with
$ killall corosync && sleep 0.1 && ls -l /etc/pve/ \
&& systemctl start corosync
Expected behavior: corosync is dead, the ls should show that
everything in /etc/pve is read only
Shown: behavior: /etc/pve still has read/write access and
PVE::Cluster::check_cfs_quorum() still returns true
* service_quorum_initialize: just to be sure as we successfully
registered the quorum notification function already
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 24 May 2016 13:55:53 +0000 (15:55 +0200)]
cleanup format strings for cfs_* messages
This does not change semantics on our current target platform
(x86_64) but is needed for porting it to other platforms.
The GCC on ARM, for example, complains about them.
For all:
* size_t use "%z*"
* off_t use "%j*"
* uint64_t use "PRI*64"
where * may be one of (X,d,u).
Also cast guint64 to uint64_t to allow use of a general, portable
format which also supports hex output as the GUINT64_FORMAT allows
decimal output only.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Cc: mir@datanom.net
Add warning for pvecm commands if not part of cluster
If the cluster config file is missing, pvecm status, nodes
and expected will probably not work. Add a helpful warning
because the corosync-quorumtool error message is not very
descriptive here.
Add a helper sub in Cluster.pm to actually do the check.
This should prevent issues when generating certificates on
first boot of a node with the RTC wrongly set to the local
timezone instead of UTC. Since we cannot require the node to
be synchronized with an NTP server, we pretend it's
yesterday when calling openssl.
Thomas Lamprecht [Mon, 12 Oct 2015 10:14:17 +0000 (12:14 +0200)]
improve RRP support and use 'name' subkey as default
This patches allows to configure RRP (= redundant ring protocol)
at cluster creation time. Also setting ring 0 and 1 addresses when
adding a new node. This helps and fixes some bugs when corosync runs
completely separated on an own network.
Changing rrp configs, or the bindnet addresses automatically on an
running cluster isn't supported and not planned, as it needs an
complete cluster reboot and has to many possible failure points.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
prefer 'name' subkey over 'ring0_addr' for nodename
Use the name subkey from the cmap keys by default, if not set
fallback to the ring0_addr.
This fixes some issues when we move the corosync communication to
a different network and use an specific address or an new hostname
for that. Withouth this patch the nodename in the .members special
file changes together with ring0_addr, which can result in quite a
few problems (e.g.: in the ha-manager).
This allows also to separate the webinterface traffic from corosync.
IP adresses can be used for ring0 addresses directly now also,
without making problems.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 25 Sep 2015 15:50:05 +0000 (17:50 +0200)]
add function to lock a domain
This can be used to execute code on an 'action domain' basis.
E.g.: if there are actions that cannot be run simultaneously even if
they, for example, don't access a common file and maybe also spread
across different packages we can now secure the consistence of said
actions on an 'action domain' basis.
The need to use a dirty hack like cfs_lock_storage with some
arbitrary storage name becomes obsolete. Also the code behaviour
and meaning becomes clearer.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
pvecm create: put brackets around hostnames for rsync
ssh and friends differ with respect to ipv6notations
* ssh: 'user@host'
* ssh-copy-id: 'user@host' as it uses ssh
* scp: 'user@[host]:file'
* rsync --rsh=ssh: '[user@host]'
rsync accepts brackets for all of ipv4, ipv6 or named hosts, so simply
defaulting to always using them works
pvecm create: add corosync.conf parameters for ipv6
pvecm create now adds the following additional corosync.conf parameters:
* totem.ip_version
* totem.interface.ringnumber
* totem.interface.bindnetaddr
For ipv6 corosync needs a 'totem.interface' list with at least one entry
containing a bindnetaddr setting. Additionally 'totem.ip_version' needs to
explicitly be set to ipv6 (or an 'mcastaddr' set, which corosync can
choose automatically, though, so we let it do just that).
remote_node_ip: option to include the packet family
If an array is requested, the function now returns ($ip, $family),
otherweise just the IP alone.
Several ipv6 related changes in other packages need to pass the packet
family to functions and will make use of this functionality.