so that API paths that raise an exception while holding a CFS lock
properly propagate that exception to the client, instead of the
stringified version with added noise about locks added to the front.
Stefan Reiter [Thu, 9 Jan 2020 15:31:36 +0000 (16:31 +0100)]
Add cluster join API version check
Adds API call GET /cluster/config/apiversion to retrieve remote clusters
join-API version (0 is assumed for versions without this endpoint). Also
available via CLI as 'pvecm apiver'.
Introduce API_AGE similar to storage plugin API, but with two ages for
cluster/joinee roles. Currently, all versions are intercompatible.
For future usage, a new 'addnode' parameter 'apiversion' is introduced,
to allow introducing API breakages for joining nodes as well.
As a first compatibility check, use new fallback method only if
available. This ensures full compatibility between nodes/clusters with
and without new fallback behaviour.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Aaron Lauterer [Tue, 24 Mar 2020 16:16:42 +0000 (17:16 +0100)]
pvecm: qdevice setup: fix check for odd node count
With Perl 5.26 the behavior of `scalar(%hash)` changed [0] causing the
check for odd numbers to never evaluate to true. Allowing odd sized
clusters to set up a QDevice. The algorithm was not changed to LMS if
forced to still create the QDevice.
Instead of showing the bucket info of the referenced hash it did show
the hash reference. Dereferencing it will again return the number of
items present in the hash.
Stefan Reiter [Thu, 9 Jan 2020 15:31:35 +0000 (16:31 +0100)]
Add verification and fallback to cluster join/addnode
Verify that the config of the new node is valid and compatible with the
cluster (i.e. that the links for the new node match the currently
configured nodes).
Additionally, fallback is provided via a new parameter to addnode,
'new_node_ip'. Previously, fallback was handled on the joining node, by
setting it's local IP as 'link0', however, a cluster with only one link,
but numbered 1-7 is still valid, and a fallback is possible, but the old
code would now fail.
Instead, pass the locally resolved IP via a seperate parameter
(resolving the IP on the cluster side is impractical, as IP resolution
could fail or provide a wrong IP for Corosync).
For compatibility reasons, allow fallback to occur via the old
method as well, but mark with FIXME for future removal.
Fallback fails in case the cluster has more than one link, in this case
only the user can know which NIC/IP corresponds to which cluster link.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 9 Jan 2020 15:31:34 +0000 (16:31 +0100)]
Enable support for up to 8 corosync links
add_corosync_link_properties/extract_corosync_link_args are introduced
as helpers to avoid hardcoding links in parameters=>properties on
several occasions, while still providing autocompletion with pvecm by
being seperate parameters instead of an array.
Maximum number of links is given as constant MAX_LINK_COUNT, should it
change in the future.
All necessary functions have been updated to
use the new $links array format instead of seperate $link0/$link1
parameters, and call sites changed accordingly.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 9 Jan 2020 15:31:33 +0000 (16:31 +0100)]
corosync: add verify_conf
It does some basic sanity checking, warns the user about encryption
settings and unresolved hostnames, and finally makes sure that all nodes
have the same links configured (as well as comparing the configured
links to specified interfaces, if there are any).
A corosync.conf that has been created and modified strictly through our
API should *always* be valid.
verify_conf is called in 'addnode', warnings and errors are returned via
the API to be displayed in the task log of the node asking to join. If a
verification error occurs, it is handled specially via a "raise" outside
of any lock code that strips extra information from an Exception
instance. This ensures that multi-line formatted errors can be returned.
Warnings are always returned as array, to be printed on the caller.
Includes testing.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Add a new IPC request which takes a token string and matches it with
the priv/token.cfg shadow file, this allows non-root processes with
the privilege of doing IPC requests, to verify tokens without being
able to read the full token list itself.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ Thomas: solved merge conflict in observer files struct ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Kevin Greßlehner [Tue, 14 Jan 2020 11:48:28 +0000 (11:48 +0000)]
Fix #2553: Prevent the Deadlock by aligning the lockorder
Overview:
Every once in a while the /etc/pve directory freezes. ("ls" and "df"
does not work) Therefore the most pve components do not work anymore.
(webinterface is not answering, shell commands do not work) This
mostly happens during snapshots, which happen frequently in my case.
The workaround / temporary solution is to restart the pve-cluster
service.
Steps to reproduce:
Make frequent snapshots/snapshot-deletes on a default installation.
The /etc/pve direcetory will freeze at some point.
Cause;
When a snapshot is made, it eventually invokes memdb_rename
(memdb.c:1103), which at first locks the memdb->mutex at memdb.c:1122
and then invokes the methods vmlist_different_vm_exists
(memdb.c:1147) or vmlist_register_vm (memdb.c:1233). These methods
are defined in status.c and want to lock the mutex lock of
(status.c:689 and status.c:669.
The deadlock appears when cfs_create_guest_conf_propertiy_msg aquires
the status.c mutex lock while memdb_rename aquires the memdb.c mutex
lock at the same time. Then cfs_create_guest_conf_propertiy_msg wants
to lock the memdb.c lock at memdb_read (which is held by
memdb_rename) and vmlist_different_vm_exists or vmlist_register_vm
wants to lock the status.c lock (which is held by
cfs_create_guest_conf_propertiy_msg). Both methods are waiting for
each other to unlock their locks -> deadlock.
Fix:
Fix by aligning the lockorder of the memdb and status mutex lock
calls.
Lock &memdb->mutex in memdb_read and refer to a new method
"memdb_read_nolock" in memdb.c which doesn't handle locks by itself.
This method then handles the stuff which was originally in
memdb_read. Therefore everything except
cfs_create_guest_conf_property_msg uses memdb_read (which handles the
locking itself), and cfs_create_guest_conf_property_msg prelocks
&memdb->mutex and invokes memdb_read_nolock.
Signed-off-by: Kevin Greßlehner <kevin_gressi@live.at>
[ added more info from bug report & fixed indentation/line endings ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 26 Nov 2019 12:50:32 +0000 (13:50 +0100)]
mtunnel: allow multiple IPs if the are the same
To allow routed full-mesh, where the same IP is used on multiple
adapters. For the migration IP this is OK, as we just want a single
unique IP, if that one is configured more than once does not bothers
us here.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Tue, 26 Nov 2019 10:01:23 +0000 (11:01 +0100)]
change certificate lifetime to two years
instead of 10 years, to avoid issues with browsers/os that reject
certificates which have a longer lifetime
(e.g. macOs Catalina only accepts max 825 days if issued after july 2019)
also limit the lifetime by the expiry date of the ca, since
a certifiacte cannot be valid longer than its ca
Stefan Reiter [Tue, 19 Nov 2019 09:28:29 +0000 (10:28 +0100)]
corosync: die in check_conf_exists if !$noerr
...and change $silent to $noerr for consistency.
Commit 3df092f9 (fix #1380: pvecm status: add general cluster
information) broke "pvecm status" on non-cluster nodes (well, it made
the error look worse, ofc it didn't "work" before either) because it
tries to access a totem that cannot exist without a corosync.conf.
pvecm status/nodes/expected already fail without a cluster, so it makes
more sense to fail early. But instead of copying the way the qdevice API
handles it, move the die to check_conf_exists directly, which makes
more sense then a warn anyway IMHO.
check_conf_exists is never called without $noerr = 1 outside of
pvecm.pm, so this change does not require any versioned depends/breaks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Thomas Lamprecht [Mon, 18 Nov 2019 10:46:35 +0000 (11:46 +0100)]
d/control: make api lib depend on the same version as cluster lib
As they need to be the same to work, else some half-upgrades or
half-downgrades can be done, which may break stuff badly. So tell
apt/dpkg about the relation ship be doing a hard version dependency
on ${binary:Version} which is our currently build package version
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
split package into pve-cluster/pmxcfs and perl modules
with the former containing:
- pmxcfs binary + service
- IPCC perl bindings
- PVE::Cluster
and the latter being further split into
libpve-cluster-perl:
- PVE::DataCenterConfig
- various other perl modules not directly related to pmxcfs
and libpve-cluster-api-perl:
- ClusterConfig API
- pvecm CLI
- PVE::Corosync
- PVE::Cluster::Setup helper module
this second split is needed to avoid a (pre-existing) circular
dependency between libpve-access-control and libpve-cluster-perl:
- the cluster API code uses PVE::RPCEnvironment
- the access-control API code uses PVE::DataCenterConfig
Commit 926f961f62f5 used a new temporary pointer variable
for type correctness, but the return value was still using
the previous variable which had not been moved forward
anymore.
Thomas Lamprecht [Thu, 29 Aug 2019 14:59:55 +0000 (16:59 +0200)]
pmxcfs server: fix off-by-one error when ensuring string NUL termination
done once, then copied over by copy-is-my-hobby, once by me too :)
While this is in the relative big SHM we get from the libqb backed
IPC mechanisms, and thus there's a really really low chance to hit a
corruption of another following data element here, it's still a
possibility.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 29 Aug 2019 12:45:08 +0000 (14:45 +0200)]
pmxcfs: get config properties: ensure we do not read after the config
pmxcfs files need to be treated as blobs, while we can have some
assumptions on certain files, like the $vmid.conf ones, we should
still cope with problematic files.
Especially, the files may not end with \0, so always ensure that we
read at most file-size bytes.
Replace strtok_r, which assumes that the data is NUL terminated, and
use memchr, with logic ensuring that we never read over the size
returned by memdb_read.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Thomas Lamprecht [Fri, 30 Aug 2019 05:45:28 +0000 (07:45 +0200)]
pmxcfs: fixup dcdb pointer void* aritmethic fix
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit be072d67c81373a59913a5df729788eaea53619e) Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 29 Aug 2019 17:45:18 +0000 (19:45 +0200)]
check_memdb: free data to allow building with memory leak sanitizer
while this "memory leak" was irrelevant (short running anyway, so the
OS could clean up after us just fine) let's free the malloced stuff
nonetheless - this allows to build with -fsanitize=address and
-fsanitize=undefined
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 29 Aug 2019 14:27:39 +0000 (16:27 +0200)]
pmxcfs: fix more void pointer arithmetic
To be able to finally enable -Wpedantic during compile in a future
patch. This ensures that the arithmetic actually happens on byte
granularity, while void* is undefined.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
PVE::Cluster::cfs_lock_file sets $@ and returns undef for all errors,
including when $code dies. PVE::Tools::lock_file runs $code inside an
eval as well, so just setting $@ is not enough when nesting these two
types of locks.
re-die with the inner error to actually propagate error messages and
fail instead of proceeding. this triggered (probably among other cases)
when attempting to join an existing cluster without specifying all
needed links.
Stefan Reiter [Mon, 1 Jul 2019 15:22:14 +0000 (17:22 +0200)]
Add functions to resolve hostnames and iterate corosync nodes
The sub 'for_all_corosync_addresses' iterates through all nodes in a
passed corosync config and calls a specified function for every ringX_addr
on every node it finds (provided the IP-version matches the specified
one or undef was specified).
All ringX_addr entries that cannot be parsed as an IP address will be
best-effort resolved as hostnames. This has to happen in the exact same
way as corosync does internally, to ensure consistency with firewall
rules.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Thomas Lamprecht [Thu, 27 Jun 2019 09:15:56 +0000 (11:15 +0200)]
Revert "pvecm: remove mtunnel"
This reverts commit 7a415f9657e68114c29b0bd1cad52283c203950a.
For now we have to many bad users of that, they all never should have
used this in the first place, but it slipped in so here we are..
Thomas Lamprecht [Mon, 24 Jun 2019 10:44:51 +0000 (12:44 +0200)]
pmxcfs: workaround dumb g_string_free behaviour
While GLib mentions that this method is nullable[0][1] (i.e., can be passed
and can return null) its use of the, a bit misleading,
g_return_val_if_fail[2] voids that, as passing NULL emits an
warning[2] which looks pretty grave (assertion failure), albeit is
just noise..
Thomas Lamprecht [Mon, 24 Jun 2019 10:42:20 +0000 (12:42 +0200)]
pmxcfs: get guest cfg properties: use g_string_sized_new
While with NULL as first argument g_string_new_len effectively
becomes a g_string_sized_new it can be confusing as the docs do not
mention that. Also this may lead to an error if one changes the call
with out to much research, so fix it to the one function we should
used to begin with here.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 28 May 2019 16:14:18 +0000 (18:14 +0200)]
node join: use new corosync link parameters
Similar to the change to cluster creation use now also the
corosync-link definition for the rest of the cluster join/add calls.
As link0, former ring0, is not special anymore allow that it's not
passed and only default back to nodename if it's configured in the
totem section of the configuration.
As the 'join' and 'addnode' api paths are quite connected, do all in
one patch.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 28 May 2019 16:13:22 +0000 (18:13 +0200)]
corosync: allow to set link priorities
For now in passive mode, a link with a higher value has a lower
priority. If the current active link fails the one with the next
higher priority will take over. Use 255 as maximum, as internally
kronosnet uses an uin8_t variable for this, and while there can be
"only" 8 links currently it may be still nice to use different values
that ]0..1[ for them, e.g., when re-shuffling link priorities it's
useful to have space between them.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 28 May 2019 16:07:05 +0000 (18:07 +0200)]
cluster create: use new corosync-link format for totem interfaces
Preparation for enhanced compatibility with new corosync 3/knet
transport. Pretty straight forward switch from ringX_addr to links,
*but*, for configuration backward compatibility corosync still uses
"ringX_addr" as "link address", this will surely add confusion...
We drop all the "all IP versions must match" checking code, as
1. it could not cope with unresolved hostname's anyway
2. links can be on different IP versions with kronosnet
This makes it a bit easier and shorter, we can re-add some (saner)
checking always later on, if people misconfigure this often..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Mon, 27 May 2019 16:08:47 +0000 (18:08 +0200)]
corosync config: support 'linknumber' property
Corosync has moved its rings a layer up, i.e., abstracted away from
the network layer below. That what early were called rings are now
links, knet can have up to 8 all others 1, for now.
Let our parser understand this change in the totem section of the
config, but keep backwards compatibility and accept 'ringnumber'
also.
While we are at it, try to write out the two map operations used in a
bit more readable way.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>