This patch introduces support for Cephs RBD namespaces.
A new storage config parameter 'namespace' defines the namespace to be
used for the RBD storage.
The namespace must already exist in the Ceph cluster as it is not
automatically created.
The main intention is to use this for external Ceph clusters. With
namespaces, each PVE cluster can get its own namespace and will not
conflict with other PVE clusters.
The <pool>/<image> paths are needed in quite a lot of places. Having one
single place where they are created helps to reduce duplicate code and
makes it easier to introduce new features.
The 'add_pool_to_disk' sub was already doing that but the name was not
really fitting. This commit renames it to the more general
'get_rbd_path' and changes the second parameter to the more widely used
$volume instead of $disk.
Furthermore, all occurences where "$pool/$volume" has been concatenated
have been replaced with a call to get_rbd_path.
Plus some minor code style cleanups for long function calls that were
touched.
by relying on archive_info's vmid first. archive_info is already used to
determine if it's a standard name, and in that case the vmid is certainly set.
Also add asserts to make sure we got what we expected.
The mentioned commit is actually a backwards-incompatible change that leads to
slightly different behavior when migrating a VM with volumes on a misconfigured
storage. For example, unreferenced volumes on a misconfigured storage won't be
picked up, even though they were before. And for referenced volumes on a
misconfigured storage, the disk size would not be updated on migration anymore.
We should wait until the next major release for this change and then also
re-evaluate the migration behavior with misconfigured disks.
Fabian Ebner [Fri, 12 Mar 2021 09:50:26 +0000 (10:50 +0100)]
vdisk list: only collect images from storages with an appropriate content type
Only these storages are activated in the first place, and it's bad behavior to
list images when no appropriate content type is not set.
For example, on VM destruction, this avoids unreferenced images to be deleted
from a storage with only 'backup' content type set, which is supposedly what
happened in this[0] forum thread.
(Some) callers expect all keys to be present and valid array references in the
result, so initialization is needed.
Now, the enabled check is already done by the preceding code for every element
that is iterated over, and thus isn't needed in the main loop anymore.
Fabian Ebner [Wed, 10 Mar 2021 09:26:27 +0000 (10:26 +0100)]
api: disk list: allow if an audit permission for the node is present
as that seems to be the more natural permission path for listing a nodes local
disks. For backwards compatibility, the old permission check has to be kept
(relevant with propagate=0).
This API call was originally part of the Ceph API and got copied here later,
which might explain the current permission check.
In the UI, the Disk panel is visible with a node audit permission, but the API
call itself failed without the '/' audit permission.
Fabian Ebner [Thu, 11 Feb 2021 10:24:13 +0000 (11:24 +0100)]
storage migration: insecure: improve logging
by including the message/error from the remote side. Some people on the forum[0]
ran into 'no tunnel IP received', but without information from the remote side
it's hard to tell why.
Thomas Lamprecht [Fri, 19 Feb 2021 14:01:36 +0000 (15:01 +0100)]
zpool: avoid wrong mount-decoding of dataset
this was mistakenly done as the procfs code uses it and it was
assumed we need to decode this too to get both in the same
encoding-space and thus correct comparission.
But only procfs has that encoding, we don't have it for pool values
in the storage config, so we must not do a decode on that value, that
could potentially break.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stoiko Ivanov [Fri, 19 Feb 2021 12:45:44 +0000 (13:45 +0100)]
zfspoolplugin: check if imported before importing
This commit is a small performance optimization to the previous one:
`zpool list` is cheaper than `zpool import -d /dev..` (the latter
scans the disks in the provided directory for zfs signatures,
unconditionally)
Stoiko Ivanov [Fri, 19 Feb 2021 12:45:43 +0000 (13:45 +0100)]
zfspoolplugin: check if mounted instead of imported
This patch addresses an issue we recently saw on a production machine:
* after booting a ZFS pool failed to get imported (due to an empty
/etc/zfs/zpool.cache)
* pvestatd/guest-startall eventually tried to import the pool
* the pool was imported, yet the datasets of the pool remained
not mounted
A bit of debugging showed that `zpool import <poolname>` is not
atomic, in fact it does fork+exec `mount` with appropriate parameters.
If an import ran longer than the hardcoded timeout of 15s, it could
happen that the pool got imported, but the zpool command (and its
forks) got terminated due to timing out.
reproducing this is straight-forward by setting (drastic) bw+iops
limits on a guests' disk (which contains a zpool) - e.g.:
`qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\
iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,\
mbps_wr_max=15`
afterwards running `timeout 15 zpool import <poolname>` resulted in
that situation in the guest on my machine
The patch changes the check in activate_storage for the ZFSPoolPlugin,
to check if any dataset below the 'pool' (which can also be a sub-dataset)
is mounted by parsing /proc/mounts:
* this is cheaper than running `zfs get` or `zpool list`
* it catches a properly imported and mounted pool in case the
root-dataset has 'canmount' set to off (or noauto), as long
as any dataset below is mounted
After trying to import the pool, we also run `zfs mount -a` (in case
another check of /proc/mounts fails).
Potential for regression:
* running `zfs mount -a` is problematic, if a dataset is manually
umounted after booting (without setting 'canmount')
* a pool without any mounted dataset (no mountpoint property set and
only zvols) - will result in repeated calls to `zfs mount -a`
both of the above seem unlikely and should not occur, if using our
tooling.
Fabian Ebner [Fri, 15 Jan 2021 10:58:05 +0000 (11:58 +0100)]
remove lock from is_base_and_used check
and squash the __no_lock-variant into it.
This lock is not broad enough, because for a caller that plans to do or not do
some storage operation based on the result of the check, the following could
happen:
1. volume_is_base_and_used is called and the result is used to enter a branch
2. situation on the storage changes in the meantime
3. the branch chosen in 1. might not be the one that should be taken anymore
This means that callers are responsible for locking, and luckily the existing
callers do use their own locks already:
1. vdisk_free used the __no_lock-variant with a broader lock also covering
the free operation.
2. vdisk_clone is not a caller, but is relevant and it does lock the storage
2. the calls during VM migration and VM destruction happen in the context of a
locked VM config. Because the clone operation also locks the VM config, it
cannot happen that a linked clone is created while the template VM is
migrated away or destroyed or vice versa. And even if that were the case,
the base disk would not be freed, because of what vdisk_free/vdisk_clone do.
Fabian Ebner [Tue, 26 Jan 2021 11:45:27 +0000 (12:45 +0100)]
Diskmanage: also include partitions with get_disks if flag is set
and have a parent key for partitions, to be able to see the associated disk in
the result without having to rely on naming heuristics (just adding a number at
the end doesn't work for NVMes).
The disk's usage will not be based on the partitions usage if the flag is set,
but will simply be 'partitions'.
Fabian Ebner [Tue, 26 Jan 2021 11:45:25 +0000 (12:45 +0100)]
Diskmanage: introduce ceph info helper
so it can be re-used for partitions.
Also changes the regular expression in get_ceph_volume_info to match the full
device/partition name the LV is on. Not only is this needed for partitions,
especially if there's multiple partitions with an OSD, but it also fixes
handling NVMe devices with an OSD as a side effect. Previuosly those were not
detected here, because of the digits in the name, e.g. /dev/nvme0n1
Fabian Ebner [Tue, 26 Jan 2021 11:45:23 +0000 (12:45 +0100)]
Diskmanage: introduce usage helper
Note that this is a slight behavior change, because now the first
partition's usage which is not simply 'partition' will become the disk's
usage. Previously, if any partition was 'mounted', it would become the disk's
usage, then 'LVM', 'ZFS', etc.
A partitions usage defaults to 'partition' if nothing more specific can be
found, and is never treated as unused for now.
Fabian Ebner [Tue, 26 Jan 2021 11:45:19 +0000 (12:45 +0100)]
Diskmanage: refactor and rename get_parttype_info
in preparation to also query the file system type from lsblk. Note that the
result now also includes devices without a parttype, so a definedness check in
get_devices_by_partuuid is needed. This will be useful when the whole device
contains a filesystem.
Fabian Ebner [Tue, 26 Jan 2021 11:45:17 +0000 (12:45 +0100)]
Disks: return correct journal disk candidates
Previously any GPT initialized disk without an osdid (i.e. equal to -1) would
be included in the list of journal disk candidates, for example a ZFS disk. But
the OSD creation API call will fail for those. To fix it, re-use the condition
from the corresponding check in that API call (in PVE/API2/Ceph/OSD.pm).
Now, included disks are unused disks, those with usage 'partitions' and GPT, and
those with usage 'LVM'.
Fabian Ebner [Wed, 27 Jan 2021 12:57:36 +0000 (13:57 +0100)]
mark PBS storages as shared
Like this, the property will get added when parsing the storage configuration
and PBS storages will correctly show up as shared storages in API results.
AFAICT the only affected PBS operation is free_image via vdisk_free, which will
now be protected by a cluster-wide lock, and that shouldn't hurt.
Another issue this fixes, which is the reason this patch exists, was reported
in the forum[0]. The free space from PBS storages was counted once for each node
that had access to the storage.
Alwin Antreich [Wed, 16 Dec 2020 11:59:04 +0000 (12:59 +0100)]
fix: check connection for nfs v4 only server
the check_connection is done by querying the exports of the nfs server
in question. With nfs v4 those exports aren't listed anymore since nfs
v4 employs a pseudo-filesystem starting from root (/).
rpcinfo allows to query the existence of an nfs v4 service.
Dominik Csapak [Thu, 21 Jan 2021 15:47:59 +0000 (16:47 +0100)]
add workaround for zfs rollback bug
as described in the zfs bug https://github.com/openzfs/zfs/issues/10931
the kernel keeps around cached data from mmaps after a rollback, thus
having invalid data in files that were allegedly rolled back
to workaround this (until a real fix comes along), we unmount the subvol,
invalidating the kernel cache anyway
the dataset gets mounted on the next 'activate_volume' again
Fabian Ebner [Tue, 15 Dec 2020 10:59:29 +0000 (11:59 +0100)]
fix #3199: by fixing usage of strftime
In a very early version I wanted to parse the date from the backup
name, and when switching to using the ctime and localtime() instead,
I forgot to update the usage of strftime.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Mon, 14 Dec 2020 15:03:17 +0000 (15:03 +0000)]
prune mark: correctly keep track of already included backups
This needs to happen in a separate loop, because some time intervals are not
subsets of others, i.e. weeks and months. Previously, with a daily backup
schedule, having:
* a backup on Sun, 06 Dec 2020 kept by keep-daily
* a backup on Sun, 29 Nov 2020 kept by keep-weekly
would lead to the backup on Mon, 30 Nov 2020 to be selected for keep-monthly,
because the iteration did not yet reach the backup on Sun, 29 Nov 2020 that
would mark November as being covered.
reuse the one from DirPlugin by directing the call to it, but with
the actual $class. This should stay stable, as we provide an ABI and
try to always use $class->helpers.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Previous to this we did not called the plugins update_volume_notes at
all in the case where a user delted the textarea, which results to
passing a falsy value ('').
Also adapt the currently sole implementation to delete the notes field
in the undef or '' value case. This can be done safely, as we default
to returning an empty string if no notes file exists.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
pbs: activate storage: fully validate if storage config works
improves UX of on_update and on_add hooks *a lot*.
This is a bit more expensive than the TCP ping, or even just an
unauthenticated ping, but not as bad as a full datastore status - as
this only reads the datastore config file (which is normally in page
cache anyway).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Fri, 27 Nov 2020 09:35:44 +0000 (10:35 +0100)]
plugin: hooks: add explicit returns
to avoid returning something unexpected. Finish what afeda182566292be15413d9b874720876eac14c9 already started for all the other
plugins. At least for ZFS's on_add_hook this is necessary (adding a ZFS storage
currently fails as reported here [0]), but it cannot hurt
in the other places either as the only hooks we expect to return something
currently are PBS's on_add_hook and on_update_hook.
Dominik Csapak [Tue, 24 Nov 2020 09:09:32 +0000 (10:09 +0100)]
Storage/Plugin: add get/update_volume_comment and implement for dir
and add the appropriate api call to set and get the comment
we need to bump APIVER for this and can bump APIAGE, since
we only use it at this new call that can work with the default
implementation
Fabian Ebner [Mon, 23 Nov 2020 12:33:09 +0000 (13:33 +0100)]
convert maxfiles to prune_backups when reading the storage configuration
If there are already prune options configured, simply delete the maxfiles
setting. Having set both is invalid from vzdump's perspective anyways, and any
backup job on such a storage failed, meaning a user would've noticed.
If there are no prune options, translate the maxfiles value to keep-last,
except for maxfiles being zero (=unlimited), in which case we use keep-all.
If both are not set, don't set anything, so:
1. Storages don't suddenly have retention options set.
2. People relying on vzdump defaults can still use those.
Fabian Ebner [Mon, 23 Nov 2020 12:33:08 +0000 (13:33 +0100)]
prune: introduce keep-all option
useful to have an alternative to the old maxfiles = 0. There has to
be a way for vzdump to distinguish between:
1. use the /etc/vzdump.conf default (when no options are configured for the storage)
2. use no limit (when keep-all=1)