Aaron Lauterer [Mon, 23 May 2022 10:54:25 +0000 (12:54 +0200)]
rbd: get_rbd_dev_path: return /dev/rbd path only if cluster matches
The changes in cfe46e2d4a97a83f1bbe6ad656e6416399309ba2 git not catch
all situations.
In the case of a guest having 2 disk images with the same name on a pool
with the same name but in two different ceph clusters we still had
issues when starting it. The first disk got mapped as expected. The
second disk did not get mapped because we returned the old $path to
"/dev/rbd/<pool>/<image>" because it already existed from the first
disk.
In the case that only the "old" /dev/rbd path exists and we do not have
the /dev/rbd-pve/<cluster>/... path available, we now check if the
cluster fsid used by that rbd device matches the one we expect. If it
does, then we are in the situation that the image has been mapped before
the new rbd-pve udev rule was introduced. If it does not, then we have
the situation of an ambiguous mapping in /dev/rbd and return the
$pve_path.
Aaron Lauterer [Wed, 18 May 2022 09:04:54 +0000 (11:04 +0200)]
rbd: fix #4060 show data-pool usage when configured
When a data-pool is configured, use it for status infos. The 'data-pool'
config option is used to mark the erasure coded pool while the 'pool'
will be the replicated pool holding meta data such as the omap.
This means, the 'pool' will only use a small amount of space and people
are interested how much they can store in the erasure coded pool anyway.
Therefore this patch reorders the assignment of the used pool name by
availability of the scfg parameters: data-pool -> pool -> fallback 'rbd'
Stoiko Ivanov [Tue, 3 May 2022 11:31:40 +0000 (13:31 +0200)]
rbd: warn if no stats for a pool could be gathered
happens in case of a mistyped poolname, and the new message should be
more helpful than:
`Use of uninitialized value $free in addition (+) at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 64`
Stoiko Ivanov [Tue, 3 May 2022 11:31:39 +0000 (13:31 +0200)]
rbd: add fallback default poolname 'rbd' to status
the fallback to a default pool name of 'rbd' was introduced in: 1440604a4b072b88cc1e4f8bbae4511b50d1d68e
and worked for the status command, because it used the `rados_cmd`
sub.
leading to confusing errors:
`Use of uninitialized value in string eq at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 633`
(e.g. in the journal from pvestatd)
Thomas Lamprecht [Thu, 28 Apr 2022 16:17:56 +0000 (18:17 +0200)]
rbd: get path: allow fake override of fsid in scfg for some regression tests
to avoid calls into RADOS connect, that trigger RPCEnv not
initialized breakage in regression tests, but wouldn't really work
otherwise either
in the future the RBD $scfg could actually support this (or similarly
named) property, to safe on storage addition and then avoid frequent
mon commands
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When krbd is used, subsequent removal after an an operation
involving a rename could fail with
> librbd::image::PreRemoveRequest: 0x559b7506a470 \
> check_image_watchers: image has watchers - not removing
because the old mapping was still present.
For both operations with a rename, the owning guest should be offline,
but even if it weren't, unmap simply fails when the volume is in-use.
rbd: fix #3969: add rbd dev paths with cluster info
By adding our own customized rbd udev rules and ceph-rbdnamer we can
create device paths that include the cluster fsid and avoid any
ambiguity if the same pool and namespace combination is used in
different clusters we connect to.
Additionally to the '/dev/rbd/<pool>/...' paths we now have
'/dev/rbd-pve/<cluster fsid>/<pool>/...' paths.
The other half of the patch makes use of the new device paths in the RBD
plugin.
The new 'get_rbd_dev_path' method the full device path. In case that the
image has been mapped before the rbd-pve udev rule has been installed,
it returns the old path.
The cluster fsid is read from the 'ceph.conf' file in the case of a
hyperconverged setup. In the case of an external Ceph cluster we need to
fetch it via a rados api call.
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Dominik Csapak [Wed, 9 Mar 2022 08:21:28 +0000 (09:21 +0100)]
storage plugins: en/decode volume notes as UTF-8
When writing into the file, explicitly utf8 encode it, and then try
to utf8 decode it on read.
If the notes are not valid utf8, we assume they were iso-8859 encoded
and return as is.
Technically this is a breaking change, since there are iso-8859
comments that would successfully decode as utf8, for example: the
byte sequence "C2 A9" would be "£" in iso, but would decode to "£".
From what i can tell though, this is rather unlikely to happen for
"real world" notes, because the first byte would be in the range of
C0-F7 (which are mostly language dependent characters like "Â") and
the following bytes would have to be in the range of 80-BF, which are
only special characters like "£" (or undefined)
Dominik Csapak [Thu, 23 Dec 2021 12:06:22 +0000 (13:06 +0100)]
fix #3803: ZFSPoolPlugin: zfs_request: increase minimum timeout in worker
Since most zfs operations can take a while (under certain conditions),
increase the minimum timeout for zfs_request in workers to 5 minutes.
We cannot increase the timeouts in synchronous api calls, since they are
hard limited to 30 seconds, but in worker we do not have such limits.
The existing default timeout does not change (60minutes in worker,
5seconds otherwise), but all zfs_requests with a set timeout (<5minutes)
will use the increased 5 minutes in a worker.
Fabian Ebner [Tue, 29 Mar 2022 12:53:13 +0000 (14:53 +0200)]
plugins: allow limiting the number of protected backups per guest
The ability to mark backups as protected broke the implicit assumption
in vzdump that remove=1 and current number of backups being the limit
(i.e. sum of all keep options) will result in a backup being removed.
Introduce a new storage property 'max-protected-backups' to limit the
number of protected backups per guest. Use 5 as a default value, as it
should cover most use cases, while still not having too big of a
potential overhead in many scenarios.
For external plugins that do not return the backup subtype in
list_volumes, all protected backups with the same ID will count
towards the limit.
An alternative would be to count the protected backups when pruning.
While that would avoid the need for a new property, it would break the
current semantics of protected backups being ignored for pruning. It
also would be less flexible, e.g. for PBS, it can make sense to have
both keep-all=1 and a limit for the number of protected snapshots on
the PVE side.
Fabian Ebner [Wed, 30 Mar 2022 10:24:28 +0000 (12:24 +0200)]
pvesm: extract config: check for VM.Backup privilege
In preparation to have check_volume_access() always allow access for
users with Datastore.Allocate privilege. As to not automatically give
all such users permission to extract the config too.
Fabian Ebner [Mon, 15 Nov 2021 12:37:56 +0000 (13:37 +0100)]
cifs: check connection: bubble up NT_STATUS_LOGON_FAILURE
in the same manner as NT_STATUS_ACCESS_DENIED. It can be assumed to be
a configuration error, so avoid showing the generic "storage <storeid>
is not online". Reported in the community forum:
https://forum.proxmox.com/threads/storage-is-not-online-cifs.99201/post-428858
Mira Limbeck [Fri, 18 Feb 2022 08:58:27 +0000 (09:58 +0100)]
file_size_info: cast 'size' and 'used' to integer
`qemu-img info --output=json` returns the size and used values as integers in
the JSON format, but the regex match converts them to strings.
As we know they only contain digits, we can simply cast them back to integers
after the regex.
The API requires them to be integers.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
Aaron Lauterer [Fri, 28 Jan 2022 11:22:41 +0000 (12:22 +0100)]
fix #1816: rbd: add support for erasure coded ec pools
The first step is to allocate rbd images correctly.
The metadata objects still need to be stored in a replicated pool, but
by providing the --data-pool parameter on image creation, we can place
the data objects on the erasure coded (EC) pool.
to allow reusing this with remote migration, where parsing of the source
volid has to happen on the source node, but this call has to happen on
the target node.
Fabian Ebner [Mon, 10 Jan 2022 11:50:44 +0000 (12:50 +0100)]
zfs: use -r parameter when listing snapshots
Some versions of ZFS do not automatically display the child snapshots
when '-t snapshot' is used, but require '-r' to be present
additionally[1]. And in general, it's cleaner to specify the flag
explicitly.
Because of that, commit ac5c1af led to a regression[0] in the context
of ZFS over iSCSI with zfs_get_sorted_snapshot_list. Fix it, by adding
a -r flag again.
The volume_snapshot_info function is currently only used in the
context of replication and that requires a local ZFS pool, but it
would be affected by the same issue if it is ever used in the context
of ZFS over iSCSI, so also add -r there.
Fabian Ebner [Fri, 5 Nov 2021 10:29:45 +0000 (11:29 +0100)]
lvm thin: don't assume that a thin pool and its volumes are active
There are cases where autoactivation can fail, as reported in the
community forum [0]. And it could also be that a volume was
deactivated by something outside of our control.
It doesn't seem strictly necessary to activate the thin pool itself
(creating/removing/activating LVs within the pool still works if it's
not active), but it does not report usage information as long as
neither the pool nor any of its LVs are active. Activate the pool for
that, for being able to use the flag in status(), and it should also
serve as a good indicator that there's a problem with the pool if it
can't be activated.
Before activating, check the (cached) lv_state from lvm_list_volumes.
It's necessary to update the cache in activate_storage, because the
flag is re-used in status(). Also update it for other (de)activations
to be more future-proof.
Fabian Ebner [Mon, 25 Oct 2021 13:47:49 +0000 (15:47 +0200)]
api: disks: delete: add flag for cleaning up storage config
Update node restrictions to reflect that the storage is not available
anymore on the particular node. If the storage was only configured for
that node, remove it altogether.
Fabian Ebner [Mon, 25 Oct 2021 13:47:47 +0000 (15:47 +0200)]
diskmanage: add helper for udev workaround
to avoid duplication. Current callers pass along at least one device,
but anticipate future callers that might call with the empty list. Do
nothing in that case, rather than triggering everything.
Aaron Lauterer [Tue, 9 Nov 2021 14:55:32 +0000 (15:55 +0100)]
add disk rename feature
Functionality has been added for the following storage types:
* directory ones, based on the default implementation:
* directory
* NFS
* CIFS
* gluster
* ZFS
* (thin) LVM
* Ceph
A new feature `rename` has been introduced to mark which storage
plugin supports the feature.
Version API and AGE have been bumped.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
the intention of this feature is to support the following use-cases:
- reassign a volume from one owning guest to another (which usually
entails a rename, since the owning vmid is encoded in the volume name)
- rename a volume (e.g., to use a more meaningful name instead of the
auto-assigned ...-disk-123)
only the former is implemented at the caller side in
qemu-server/pve-container for now, but since the lower-level feature is
basically the same for both, we can take advantage of the storage plugin
API bump now to get the building block for this future feature in place
already.
adapt ApiChangelog change to fix conflicts and added more detail above
prune: mark renamed and protected backups differently
While it makes no difference for pruning itself, protected backups are
additionally protected against removal. Avoid the potential to confuse
the two. Also update the description for the API return value and add
an enum constraint.
fix #3307: make it possible to set protection for backups
A protected backup is not removed by free_image and ignored when
pruning.
The protection_file_path function is introduced in Storage.pm, so that
it can also be used by vzdump itself and in archive_remove.
For pruning, renamed backups already behaved similiar to how protected
backups will, but there are a few reasons to not just use that for
implementing the new feature:
1. It wouldn't protect against removal.
2. It would make it necessary to rename notes and log files too.
3. It wouldn't naturally extend to other volumes if that's needed.
add generalized functions to manage volume attributes
replacing the ones for handling notes. To ensure backwards
compatibility with external plugins, all plugins that do not just call
another implementation need to call $class->{get, update}_volume_notes
when the attribute is 'notes' to catch any derived implementations.
This is mainly done to avoid the need to add new methods every time a
new attribute is added.
Not adding a timeout parameter like the notes functions have, because
it was not used and can still be added if it ever is needed in the
future.
For get_volume_attribute, undef will indicate that the attribute is
not supported. This makes it possible to distinguish "not supported"
from "error getting the attribute", which is useful when the attribute
is important for an operation. For example, free_image checking for
protection (introduced in a later patch) can abort if getting the
'protected' attribute fails.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
dir plugin: get notes: return undef if notes are not supported
This avoids showing empty notes in the result of the content/{volid}
API call for volumes that do not even support notes. It's also in
preparation for the proposed get_volume_attribute generalization,
which expects undef to be returned when an attribute is not supported.
Fabian Ebner [Thu, 12 Aug 2021 11:01:01 +0000 (13:01 +0200)]
zfspool: add blockers parameter to volume_snapshot_is_possible
useful for rollback, so that only the required replication snapshots
can be removed, and it's possible to abort early without deleting any
replication snapshots if there are other non-replication snasphots
blocking rollback.
Fabian Ebner [Wed, 6 Oct 2021 09:18:45 +0000 (11:18 +0200)]
partially fix #2285: api: disks: allow partitions for creation paths
The calls for directory and ZFS need slight adaptations. Except for
those, the only thing that needs to be done is support partitions in
the disk_is_used helper.
Fabian Ebner [Wed, 6 Oct 2021 09:18:44 +0000 (11:18 +0200)]
api: disks: initgpt: explicitly abort for partitions
In preparation to extend disk_is_used to support partitions. Without
this new check, initgpt would also allow partitions once disk_is_used
supports partitions, which is not desirable.
Fabian Ebner [Wed, 6 Oct 2021 09:18:43 +0000 (11:18 +0200)]
diskmanage: don't set usage for unused partitions
The disk type is already 'partition' so there's no additional
information here. And it would need to serve as a code-word for
unused partitions. The cleaner approach is to not set the usage.
Fabian Ebner [Wed, 6 Oct 2021 09:18:42 +0000 (11:18 +0200)]
diskmanage: wipe blockdev: also change partition type
when called with a partition. Since get_disks uses the partition type
(among other things) to detect LVM and ZFS volumes, such volumes would
still be seen as in-use after wiping. Thus, also change the partition
type and simply use 0x83 "Linux filesystem".
Fabian Ebner [Wed, 6 Oct 2021 09:18:41 +0000 (11:18 +0200)]
diskmanage: add change_parttype and is_partition helpers
For change_parttype, only GPT-partitioned disks are supported, as I
didn't see an option for sgdisk to make it also work with
MBR-partitioned disks. And while sfdisk could be used instead (or
additionally) it would be a new dependency, and AFAICS require some
conversion of partition type GUIDs to MBR types on our part.
commit e4d56f096ed28761d6b9a9e348be0fc682928040 removes a `sleep 1`
hack for removal of a tempfile which earlier happened in the
pve-http-server but we do ourself now.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this racey sleep(1) is only there for legacy reasons: because
we don't use apache anymore and only emulate its behabiour
regarding removing temp files, this is under our own control
now and so we can improve this whole situation.
this change requires a pve-http-server version, in which the
tmpfile gets not automatically removed anymore.
Requires that the $include_partitions parameter is set too, which:
1. Makes sense, because the partition won't be included in the result
otherwise.
2. Ensures backwards compatibility for existing callers that don't
use $include_partitions. No existing callers use both $disks and
$include_partitions at the same time, so nothing learns to
"support" partitions by accident.
Moving the strip_dev helper to the top, so it can be used everywhere.
api: disk: work around udev bug to ensure its database is updated
There is a udev bug [0] which can ultimately lead to the udev database
for certain devices not being actively updated. Determining whether a
disk is used or not in get_disks() (in part) relies upon lsblk, which
queries the udev database. Ensure the information is updated by
manually calling 'udevadm trigger' for the changed devices.
It's most important for the 'directory' API path, as mounting depends
on the '/dev/disk/by-uuid'-symlink to be generated.
Because then it might not be unused anymore. If there really is a
race, this prevents e.g. sgdisk creating a partition on a device
already in use by LVM or LVM destroying a partitioned device.
For ZFS, also get the latest udev info once inside the worker.