Commit 926f961f62f5 used a new temporary pointer variable
for type correctness, but the return value was still using
the previous variable which had not been moved forward
anymore.
Thomas Lamprecht [Thu, 29 Aug 2019 14:59:55 +0000 (16:59 +0200)]
pmxcfs server: fix off-by-one error when ensuring string NUL termination
done once, then copied over by copy-is-my-hobby, once by me too :)
While this is in the relative big SHM we get from the libqb backed
IPC mechanisms, and thus there's a really really low chance to hit a
corruption of another following data element here, it's still a
possibility.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 29 Aug 2019 12:45:08 +0000 (14:45 +0200)]
pmxcfs: get config properties: ensure we do not read after the config
pmxcfs files need to be treated as blobs, while we can have some
assumptions on certain files, like the $vmid.conf ones, we should
still cope with problematic files.
Especially, the files may not end with \0, so always ensure that we
read at most file-size bytes.
Replace strtok_r, which assumes that the data is NUL terminated, and
use memchr, with logic ensuring that we never read over the size
returned by memdb_read.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Thomas Lamprecht [Fri, 30 Aug 2019 05:45:28 +0000 (07:45 +0200)]
pmxcfs: fixup dcdb pointer void* aritmethic fix
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit be072d67c81373a59913a5df729788eaea53619e) Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 29 Aug 2019 17:45:18 +0000 (19:45 +0200)]
check_memdb: free data to allow building with memory leak sanitizer
while this "memory leak" was irrelevant (short running anyway, so the
OS could clean up after us just fine) let's free the malloced stuff
nonetheless - this allows to build with -fsanitize=address and
-fsanitize=undefined
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 29 Aug 2019 14:27:39 +0000 (16:27 +0200)]
pmxcfs: fix more void pointer arithmetic
To be able to finally enable -Wpedantic during compile in a future
patch. This ensures that the arithmetic actually happens on byte
granularity, while void* is undefined.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
PVE::Cluster::cfs_lock_file sets $@ and returns undef for all errors,
including when $code dies. PVE::Tools::lock_file runs $code inside an
eval as well, so just setting $@ is not enough when nesting these two
types of locks.
re-die with the inner error to actually propagate error messages and
fail instead of proceeding. this triggered (probably among other cases)
when attempting to join an existing cluster without specifying all
needed links.
Stefan Reiter [Mon, 1 Jul 2019 15:22:14 +0000 (17:22 +0200)]
Add functions to resolve hostnames and iterate corosync nodes
The sub 'for_all_corosync_addresses' iterates through all nodes in a
passed corosync config and calls a specified function for every ringX_addr
on every node it finds (provided the IP-version matches the specified
one or undef was specified).
All ringX_addr entries that cannot be parsed as an IP address will be
best-effort resolved as hostnames. This has to happen in the exact same
way as corosync does internally, to ensure consistency with firewall
rules.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Thomas Lamprecht [Thu, 27 Jun 2019 09:15:56 +0000 (11:15 +0200)]
Revert "pvecm: remove mtunnel"
This reverts commit 7a415f9657e68114c29b0bd1cad52283c203950a.
For now we have to many bad users of that, they all never should have
used this in the first place, but it slipped in so here we are..
Thomas Lamprecht [Mon, 24 Jun 2019 10:44:51 +0000 (12:44 +0200)]
pmxcfs: workaround dumb g_string_free behaviour
While GLib mentions that this method is nullable[0][1] (i.e., can be passed
and can return null) its use of the, a bit misleading,
g_return_val_if_fail[2] voids that, as passing NULL emits an
warning[2] which looks pretty grave (assertion failure), albeit is
just noise..
Thomas Lamprecht [Mon, 24 Jun 2019 10:42:20 +0000 (12:42 +0200)]
pmxcfs: get guest cfg properties: use g_string_sized_new
While with NULL as first argument g_string_new_len effectively
becomes a g_string_sized_new it can be confusing as the docs do not
mention that. Also this may lead to an error if one changes the call
with out to much research, so fix it to the one function we should
used to begin with here.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 28 May 2019 16:14:18 +0000 (18:14 +0200)]
node join: use new corosync link parameters
Similar to the change to cluster creation use now also the
corosync-link definition for the rest of the cluster join/add calls.
As link0, former ring0, is not special anymore allow that it's not
passed and only default back to nodename if it's configured in the
totem section of the configuration.
As the 'join' and 'addnode' api paths are quite connected, do all in
one patch.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 28 May 2019 16:13:22 +0000 (18:13 +0200)]
corosync: allow to set link priorities
For now in passive mode, a link with a higher value has a lower
priority. If the current active link fails the one with the next
higher priority will take over. Use 255 as maximum, as internally
kronosnet uses an uin8_t variable for this, and while there can be
"only" 8 links currently it may be still nice to use different values
that ]0..1[ for them, e.g., when re-shuffling link priorities it's
useful to have space between them.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 28 May 2019 16:07:05 +0000 (18:07 +0200)]
cluster create: use new corosync-link format for totem interfaces
Preparation for enhanced compatibility with new corosync 3/knet
transport. Pretty straight forward switch from ringX_addr to links,
*but*, for configuration backward compatibility corosync still uses
"ringX_addr" as "link address", this will surely add confusion...
We drop all the "all IP versions must match" checking code, as
1. it could not cope with unresolved hostname's anyway
2. links can be on different IP versions with kronosnet
This makes it a bit easier and shorter, we can re-add some (saner)
checking always later on, if people misconfigure this often..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Mon, 27 May 2019 16:08:47 +0000 (18:08 +0200)]
corosync config: support 'linknumber' property
Corosync has moved its rings a layer up, i.e., abstracted away from
the network layer below. That what early were called rings are now
links, knet can have up to 8 all others 1, for now.
Let our parser understand this change in the totem section of the
config, but keep backwards compatibility and accept 'ringnumber'
also.
While we are at it, try to write out the two map operations used in a
bit more readable way.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 13 Jun 2019 07:01:59 +0000 (09:01 +0200)]
pmxcfs: get config property: escape double quote and backslash
This are the most relevant from as the bite JSON really and can in
theory be in our configuration property values.
While technically a literal \t, \b, \f, \r (but not \n) can be an
issue too, this values normally really do not get written into the
config by our stack, if it has been manually added, but that's off
limits. If we really need it we can add it always in the future
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 13 Jun 2019 07:01:58 +0000 (09:01 +0200)]
add get_guest_config_property IPCC method
This adds a new method to our IPCC interface.
It's a helper to get a property of a single or all guests.
It is restricted to only look at the current config state only, i.e.,
no PENDING changes and no snapshots, this is by design and wanted.
It uses the strict config format to be quick and return/continue
early, those restrictions are imposed by
PVE::QemuServer::parse_vm_config, and the container counterpart, it
mostly boils down to the following regex:
/^([a-z][a-z_]*\d*):\s*(.+?)\s*$/
and the fact that this is applied directly on the config lines (no
whitespace trimming, in any way, happens before)
Motivation for this work is to get the lock state of all guests
_quick_, allowing us to pass this info in our resource call, and
enhance the tree view of our Web User Interface with that
information. It was kept a bit more general, without bending the code
to much. The call returns a serialized JSON object with the format:
VMID => { PROPERTY => PROPERTY_VALUE }
if the property was found in VMID configuration, a empty object will
be returned if it was not found.
If one passes 0 to the request all VM configurations will be
searched, only those with a match will be returned, in the same
manner as above.
So why a IPCC call and not perl handling of this? Well performance.
Dominik's proposed a perl + cfs_read_file approach[0], while this is
relatively short and in the (mostly) safer perl land it's pretty
slow, especially on first connections. The idea for this existed
since quite a bit in my head, but Dominik's patch put it in front and
a prototype of this was born soon after that, initial evaluation and
performance comparison showed a speedup of >100 times at the initial
gathering, as [0] really benefits from the ccache afterwards, and
that _is_ a cache which gets often used and hit more "serial runs"
(i.e., in same worker) make his approach more reasonable, though.
But after a bit of work this came in not to ugly shape, and here the
numbers, with 10005 VM configs, all "real" ones with about 502 bytes
space usage, and all with a lock in the worst place, at the end.
Legend:
C0 : how many "serial runs" took place, i.e., how many runs in the
same worker
C1: IPCC (this) approach total runtime in ms
C2: IPCC (this) approach per-run runtime in ms, i.e., C1 / C0
C3: Perl + cfs_read_file ([0]) approach total runtime
C4: Perl + cfs_read_file ([0]) approach per-run runtime, i.e., C3 / C0
So, initially C beats Perl (not really correct wording, but hey) by
>200 times. But, on the second run we immediately see that while IPCC
scales almost linear the perl one doesn't, it really benefits from
the cache now, while initial call needed 3.5s this "only" needs ~200
ms more. But 200ms is still quite a bit of addition to an API call,
10k VMs are not seen to often in the wild, I guess, albeit it's
easily possible with setups using a lot of containers, here we know
about users having ~1500 CTs on only two nodes, so not completely
unrealistic either.
it was a mistake to move it here in the first place, and now we can
finally drop it again, as no PVE version can exist in 6.0, or latest
5.4 which still tries to use this one.
The one we will keep using is the one in qm from qemu-server, as
there we have full access to all it's perl modules without adding
cyclic dependencies.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Mon, 27 May 2019 07:21:39 +0000 (09:21 +0200)]
get_node_kv: unpack result from pmxcfs
in 'broadcast_node_kv' we send the status and make sure its always
a scalar so it gets send as a null terminated string via pack(Z*)
when we get it back from pmxcfs we have to unpack(Z*) again, or
we get a string with a trailing '\0'
Thomas Lamprecht [Tue, 21 May 2019 20:18:26 +0000 (22:18 +0200)]
remove workarounds for avoiding HA fencing on upgrade
those got introduced by commit ec826d72c06e6f649b2b19c3341c39abb29b19f9 and can now be safely
removed, as current pve-cluster and pve-ha-manager do show this
problems, and upgrades need to go through latest PVE 5.X which then
ensures this is fixed.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 11 Apr 2019 05:46:09 +0000 (07:46 +0200)]
pmxcfs: allow read access for www-data to /run dir
There's no real sensible information here, and we naturally only
allow read, but no write/exec.
This makes our IPCC restart connection re-cachin heuristic also work
for processes run as www-data, e.g., pveproxy, and thus guarantee a
more seamless pmxcfs restart - e.g., for package updates.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 11 Apr 2019 05:42:21 +0000 (07:42 +0200)]
ipcc: increase restart grace period
with bad timing and general restart overhead 5 secs were sometimes a
too small timeout, even if it happened really seldom. Increase it a
bit, as it's desired to have the connection stay a live in a lot of
cases, e.g., to not get logged out on pve-cluster update as
pveproxy's verify_ticked couldn't do IPCC.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stoiko Ivanov [Tue, 12 Mar 2019 15:07:40 +0000 (16:07 +0100)]
mac_prefix: do not allow multicast prefixes
MAC-addresses having the LSB of the first octet set, are considered
multicast-addresses (see [0,1]). LXC (the kernel) does not allow
such a mac-address to be set for a device, thus preventing containers from
starting if a multicast prefix is set (reported in [2] by Alexandre)
This patch introduces 'mac-prefix' (permitting only unicast prefixes) via
register_format and uses it instead of the pattern.
Oguz Bektas [Tue, 5 Mar 2019 12:42:07 +0000 (13:42 +0100)]
allow to setup and remove qdevice for cluster
makes it possible to setup and remove qdevice through pvecm
requirements:
* all hosts need corosync-qdevice installed
* host serving as qdevice needs corosync-qnetd installed
* root ssh access from pve host to qdevice host
pve-cluster: dont pretend to be a time-sync provider
time-sync.target is a special passive unit, consumers (i.e., units that
intend to say "I want to start after synchronized time has been
established") should only order themselves after it. only providers
(i.e., units that intend to say "I am responsible for synchronizing the
clock") should pull it in via a dependency.