api: join info: return explicit error code for no-cluster
allows an API client to more easily differ between this OK "error"
and an actual exception.
Note that I'd rather now just return undef or an empty object for the
no cluster case (not to sure about the original reasons about the die
anymore), but that would be a breaking change, and in fact it would
break current pve-manager versions out there, so schedule that for
the next major release (if we still want it then)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Similar to notes for nodes.
datacenter.cfg normally uses key-value pairs defined in the schema.
We bypass this to allow potentially long comments at the top.
We have some users running into issues in some cases, like syncing
huge user base through LDAP into users.cfg or having a few thousands+
of HA services, as then the per-file limit is exhausted.
Bumping that one provides only half of the solution as the total
limit of 30 MiB would only allow a few files getting that big, or
reduce the amount left over for actual guest configurations quite a
bit.
So also bump the total filesystem limit from 30 MiB to 128 MiB, so by
a factor of ~4 and in the same spirit bump the maximal numbers of
inodes (i.e., different files) from 10k to 256k, which pmxcfs can
handle still rather easily (tested with touch) and would allow to max
out the full FS limit with 512 byte files, which fits small guest
configs, so sounds like an OK proportioned limit.
That should give use quite some wiggle room again, and should be
relatively safe as most of our access is rather small and on a few
files only, only root has full access anyway and that user can break
everything already, so not much lost here.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Aaron Lauterer [Mon, 3 May 2021 10:00:11 +0000 (12:00 +0200)]
pve-cluster.service: remove ceph.service
The ceph.service file has been removed in pve-manager commit be244f1.
Therefore, there is no need to reference it anymore. This also avoids
showing the `ceph.service` as a `not found` unit.
```
will get you the following error on perl 5.32
```
garbage after JSON object, at character offset 2 (before "\x{0}") at -e line 1.
```
Note, I did not find anything related in the perldelta aricles for
the 28 -> 30 or 30 -> 32 update, the first one made a bigger jump for
the JSON module version used, so possibly a change there.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 22 Apr 2021 19:38:26 +0000 (21:38 +0200)]
cfs lock: avoid confusing lock prefix on error
we have lots of forum posts where users think that the locking was
the error, not the actual error message from the called code.
This has limited value as general-applied prefix, if a code requires
the lockid or whatever to be included in the error message they can
already do so, so just re-raise the error and be done, at least if it
is a error from the code and not from the lock setup,.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 22 Apr 2021 08:46:42 +0000 (10:46 +0200)]
pmxcfs: db: tell query planner that prepared statement are long living
SQLITE_PREPARE_PERSISTENT
The SQLITE_PREPARE_PERSISTENT flag is a hint to the query planner
that the prepared statement will be retained for a long time and
probably reused many times. Without this flag,
sqlite3_prepare_v3() and sqlite3_prepare16_v3() assume that the
prepared statement will be used just once or at most a few times
and then destroyed using sqlite3_finalize() relatively soon. The
current implementation acts on this hint by avoiding the use of
lookaside memory so as not to deplete the limited store of
lookaside memory. Future versions of SQLite may act on this hint
differently.
-- https://sqlite.org/c3ref/c_prepare_normalize.html#sqlitepreparepersistent
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 22 Apr 2021 08:18:58 +0000 (10:18 +0200)]
pmxcfs: db: use SQLITE_STATIC to avoid memory copies
we can trust that we own *value and *name until the sqlite statement
was executed, so use the STATIC bind flag to tell sqlite that it does
not need to make it's own copy in the bind statement.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
pmxcfs: do not grant LXC configs o+r permissions anymore
This was initially done because of some hook reading the config from
an unprivileged namespace when using unprivileged containers.
But, we nowadays do not do this anymore, either setup stuff before to
or use another source for getting required information (e.g., our
autodev hook uses "/var/lib/lxc/$vmid/devices").
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Removing them now could count as compat breakage, for users which
still depend on some of this weird behavior it's nicer if we do this
more explicitly with 7.0
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
cpg_mcast_joined (and transitively, cpg_join/leave) are not thread-safe.
pmxcfs triggers such operations via FUSE and CPG dispatch callbacks,
which are running in concurrent threads.
accordingly, we need to protect these operations with a mutex, otherwise
they might return CS_OK without actually doing what they were supposed
to do (which in turn can lead to the dfsm taking a wrong turn and
getting stuck in a supposedly short-lived state, blocking access via
FUSE and getting whole clusters fenced).
huge thanks to Alexandre Derumier for providing the initial bug report
and quite a lot of test runs while debugging this issue.
Thomas Lamprecht [Thu, 30 Apr 2020 15:30:44 +0000 (17:30 +0200)]
updatecerts: create base directories of observed files
replaces the random hacks where we do some hail-mary mkdir in a
writer or the like, to ensure that the directory structure exists and
we can write safely.
more central and safer would be pmxcfs itself, but to late in the
release cycle to do that now.
Chicken out if pmxcfs is not mounted, we don't want to trash it's
(future) mountpoint..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
so that API paths that raise an exception while holding a CFS lock
properly propagate that exception to the client, instead of the
stringified version with added noise about locks added to the front.
Stefan Reiter [Thu, 9 Jan 2020 15:31:36 +0000 (16:31 +0100)]
Add cluster join API version check
Adds API call GET /cluster/config/apiversion to retrieve remote clusters
join-API version (0 is assumed for versions without this endpoint). Also
available via CLI as 'pvecm apiver'.
Introduce API_AGE similar to storage plugin API, but with two ages for
cluster/joinee roles. Currently, all versions are intercompatible.
For future usage, a new 'addnode' parameter 'apiversion' is introduced,
to allow introducing API breakages for joining nodes as well.
As a first compatibility check, use new fallback method only if
available. This ensures full compatibility between nodes/clusters with
and without new fallback behaviour.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Aaron Lauterer [Tue, 24 Mar 2020 16:16:42 +0000 (17:16 +0100)]
pvecm: qdevice setup: fix check for odd node count
With Perl 5.26 the behavior of `scalar(%hash)` changed [0] causing the
check for odd numbers to never evaluate to true. Allowing odd sized
clusters to set up a QDevice. The algorithm was not changed to LMS if
forced to still create the QDevice.
Instead of showing the bucket info of the referenced hash it did show
the hash reference. Dereferencing it will again return the number of
items present in the hash.
Stefan Reiter [Thu, 9 Jan 2020 15:31:35 +0000 (16:31 +0100)]
Add verification and fallback to cluster join/addnode
Verify that the config of the new node is valid and compatible with the
cluster (i.e. that the links for the new node match the currently
configured nodes).
Additionally, fallback is provided via a new parameter to addnode,
'new_node_ip'. Previously, fallback was handled on the joining node, by
setting it's local IP as 'link0', however, a cluster with only one link,
but numbered 1-7 is still valid, and a fallback is possible, but the old
code would now fail.
Instead, pass the locally resolved IP via a seperate parameter
(resolving the IP on the cluster side is impractical, as IP resolution
could fail or provide a wrong IP for Corosync).
For compatibility reasons, allow fallback to occur via the old
method as well, but mark with FIXME for future removal.
Fallback fails in case the cluster has more than one link, in this case
only the user can know which NIC/IP corresponds to which cluster link.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 9 Jan 2020 15:31:34 +0000 (16:31 +0100)]
Enable support for up to 8 corosync links
add_corosync_link_properties/extract_corosync_link_args are introduced
as helpers to avoid hardcoding links in parameters=>properties on
several occasions, while still providing autocompletion with pvecm by
being seperate parameters instead of an array.
Maximum number of links is given as constant MAX_LINK_COUNT, should it
change in the future.
All necessary functions have been updated to
use the new $links array format instead of seperate $link0/$link1
parameters, and call sites changed accordingly.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 9 Jan 2020 15:31:33 +0000 (16:31 +0100)]
corosync: add verify_conf
It does some basic sanity checking, warns the user about encryption
settings and unresolved hostnames, and finally makes sure that all nodes
have the same links configured (as well as comparing the configured
links to specified interfaces, if there are any).
A corosync.conf that has been created and modified strictly through our
API should *always* be valid.
verify_conf is called in 'addnode', warnings and errors are returned via
the API to be displayed in the task log of the node asking to join. If a
verification error occurs, it is handled specially via a "raise" outside
of any lock code that strips extra information from an Exception
instance. This ensures that multi-line formatted errors can be returned.
Warnings are always returned as array, to be printed on the caller.
Includes testing.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Add a new IPC request which takes a token string and matches it with
the priv/token.cfg shadow file, this allows non-root processes with
the privilege of doing IPC requests, to verify tokens without being
able to read the full token list itself.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ Thomas: solved merge conflict in observer files struct ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Kevin Greßlehner [Tue, 14 Jan 2020 11:48:28 +0000 (11:48 +0000)]
Fix #2553: Prevent the Deadlock by aligning the lockorder
Overview:
Every once in a while the /etc/pve directory freezes. ("ls" and "df"
does not work) Therefore the most pve components do not work anymore.
(webinterface is not answering, shell commands do not work) This
mostly happens during snapshots, which happen frequently in my case.
The workaround / temporary solution is to restart the pve-cluster
service.
Steps to reproduce:
Make frequent snapshots/snapshot-deletes on a default installation.
The /etc/pve direcetory will freeze at some point.
Cause;
When a snapshot is made, it eventually invokes memdb_rename
(memdb.c:1103), which at first locks the memdb->mutex at memdb.c:1122
and then invokes the methods vmlist_different_vm_exists
(memdb.c:1147) or vmlist_register_vm (memdb.c:1233). These methods
are defined in status.c and want to lock the mutex lock of
(status.c:689 and status.c:669.
The deadlock appears when cfs_create_guest_conf_propertiy_msg aquires
the status.c mutex lock while memdb_rename aquires the memdb.c mutex
lock at the same time. Then cfs_create_guest_conf_propertiy_msg wants
to lock the memdb.c lock at memdb_read (which is held by
memdb_rename) and vmlist_different_vm_exists or vmlist_register_vm
wants to lock the status.c lock (which is held by
cfs_create_guest_conf_propertiy_msg). Both methods are waiting for
each other to unlock their locks -> deadlock.
Fix:
Fix by aligning the lockorder of the memdb and status mutex lock
calls.
Lock &memdb->mutex in memdb_read and refer to a new method
"memdb_read_nolock" in memdb.c which doesn't handle locks by itself.
This method then handles the stuff which was originally in
memdb_read. Therefore everything except
cfs_create_guest_conf_property_msg uses memdb_read (which handles the
locking itself), and cfs_create_guest_conf_property_msg prelocks
&memdb->mutex and invokes memdb_read_nolock.
Signed-off-by: Kevin Greßlehner <kevin_gressi@live.at>
[ added more info from bug report & fixed indentation/line endings ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 26 Nov 2019 12:50:32 +0000 (13:50 +0100)]
mtunnel: allow multiple IPs if the are the same
To allow routed full-mesh, where the same IP is used on multiple
adapters. For the migration IP this is OK, as we just want a single
unique IP, if that one is configured more than once does not bothers
us here.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Tue, 26 Nov 2019 10:01:23 +0000 (11:01 +0100)]
change certificate lifetime to two years
instead of 10 years, to avoid issues with browsers/os that reject
certificates which have a longer lifetime
(e.g. macOs Catalina only accepts max 825 days if issued after july 2019)
also limit the lifetime by the expiry date of the ca, since
a certifiacte cannot be valid longer than its ca
Stefan Reiter [Tue, 19 Nov 2019 09:28:29 +0000 (10:28 +0100)]
corosync: die in check_conf_exists if !$noerr
...and change $silent to $noerr for consistency.
Commit 3df092f9 (fix #1380: pvecm status: add general cluster
information) broke "pvecm status" on non-cluster nodes (well, it made
the error look worse, ofc it didn't "work" before either) because it
tries to access a totem that cannot exist without a corosync.conf.
pvecm status/nodes/expected already fail without a cluster, so it makes
more sense to fail early. But instead of copying the way the qdevice API
handles it, move the die to check_conf_exists directly, which makes
more sense then a warn anyway IMHO.
check_conf_exists is never called without $noerr = 1 outside of
pvecm.pm, so this change does not require any versioned depends/breaks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Thomas Lamprecht [Mon, 18 Nov 2019 10:46:35 +0000 (11:46 +0100)]
d/control: make api lib depend on the same version as cluster lib
As they need to be the same to work, else some half-upgrades or
half-downgrades can be done, which may break stuff badly. So tell
apt/dpkg about the relation ship be doing a hard version dependency
on ${binary:Version} which is our currently build package version
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
split package into pve-cluster/pmxcfs and perl modules
with the former containing:
- pmxcfs binary + service
- IPCC perl bindings
- PVE::Cluster
and the latter being further split into
libpve-cluster-perl:
- PVE::DataCenterConfig
- various other perl modules not directly related to pmxcfs
and libpve-cluster-api-perl:
- ClusterConfig API
- pvecm CLI
- PVE::Corosync
- PVE::Cluster::Setup helper module
this second split is needed to avoid a (pre-existing) circular
dependency between libpve-access-control and libpve-cluster-perl:
- the cluster API code uses PVE::RPCEnvironment
- the access-control API code uses PVE::DataCenterConfig