When krbd is used, subsequent removal after an an operation
involving a rename could fail with
> librbd::image::PreRemoveRequest: 0x559b7506a470 \
> check_image_watchers: image has watchers - not removing
because the old mapping was still present.
For both operations with a rename, the owning guest should be offline,
but even if it weren't, unmap simply fails when the volume is in-use.
rbd: fix #3969: add rbd dev paths with cluster info
By adding our own customized rbd udev rules and ceph-rbdnamer we can
create device paths that include the cluster fsid and avoid any
ambiguity if the same pool and namespace combination is used in
different clusters we connect to.
Additionally to the '/dev/rbd/<pool>/...' paths we now have
'/dev/rbd-pve/<cluster fsid>/<pool>/...' paths.
The other half of the patch makes use of the new device paths in the RBD
plugin.
The new 'get_rbd_dev_path' method the full device path. In case that the
image has been mapped before the rbd-pve udev rule has been installed,
it returns the old path.
The cluster fsid is read from the 'ceph.conf' file in the case of a
hyperconverged setup. In the case of an external Ceph cluster we need to
fetch it via a rados api call.
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Dominik Csapak [Wed, 9 Mar 2022 08:21:28 +0000 (09:21 +0100)]
storage plugins: en/decode volume notes as UTF-8
When writing into the file, explicitly utf8 encode it, and then try
to utf8 decode it on read.
If the notes are not valid utf8, we assume they were iso-8859 encoded
and return as is.
Technically this is a breaking change, since there are iso-8859
comments that would successfully decode as utf8, for example: the
byte sequence "C2 A9" would be "£" in iso, but would decode to "£".
From what i can tell though, this is rather unlikely to happen for
"real world" notes, because the first byte would be in the range of
C0-F7 (which are mostly language dependent characters like "Â") and
the following bytes would have to be in the range of 80-BF, which are
only special characters like "£" (or undefined)
Dominik Csapak [Thu, 23 Dec 2021 12:06:22 +0000 (13:06 +0100)]
fix #3803: ZFSPoolPlugin: zfs_request: increase minimum timeout in worker
Since most zfs operations can take a while (under certain conditions),
increase the minimum timeout for zfs_request in workers to 5 minutes.
We cannot increase the timeouts in synchronous api calls, since they are
hard limited to 30 seconds, but in worker we do not have such limits.
The existing default timeout does not change (60minutes in worker,
5seconds otherwise), but all zfs_requests with a set timeout (<5minutes)
will use the increased 5 minutes in a worker.
Fabian Ebner [Tue, 29 Mar 2022 12:53:13 +0000 (14:53 +0200)]
plugins: allow limiting the number of protected backups per guest
The ability to mark backups as protected broke the implicit assumption
in vzdump that remove=1 and current number of backups being the limit
(i.e. sum of all keep options) will result in a backup being removed.
Introduce a new storage property 'max-protected-backups' to limit the
number of protected backups per guest. Use 5 as a default value, as it
should cover most use cases, while still not having too big of a
potential overhead in many scenarios.
For external plugins that do not return the backup subtype in
list_volumes, all protected backups with the same ID will count
towards the limit.
An alternative would be to count the protected backups when pruning.
While that would avoid the need for a new property, it would break the
current semantics of protected backups being ignored for pruning. It
also would be less flexible, e.g. for PBS, it can make sense to have
both keep-all=1 and a limit for the number of protected snapshots on
the PVE side.
Fabian Ebner [Wed, 30 Mar 2022 10:24:28 +0000 (12:24 +0200)]
pvesm: extract config: check for VM.Backup privilege
In preparation to have check_volume_access() always allow access for
users with Datastore.Allocate privilege. As to not automatically give
all such users permission to extract the config too.
Fabian Ebner [Mon, 15 Nov 2021 12:37:56 +0000 (13:37 +0100)]
cifs: check connection: bubble up NT_STATUS_LOGON_FAILURE
in the same manner as NT_STATUS_ACCESS_DENIED. It can be assumed to be
a configuration error, so avoid showing the generic "storage <storeid>
is not online". Reported in the community forum:
https://forum.proxmox.com/threads/storage-is-not-online-cifs.99201/post-428858
Mira Limbeck [Fri, 18 Feb 2022 08:58:27 +0000 (09:58 +0100)]
file_size_info: cast 'size' and 'used' to integer
`qemu-img info --output=json` returns the size and used values as integers in
the JSON format, but the regex match converts them to strings.
As we know they only contain digits, we can simply cast them back to integers
after the regex.
The API requires them to be integers.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
Aaron Lauterer [Fri, 28 Jan 2022 11:22:41 +0000 (12:22 +0100)]
fix #1816: rbd: add support for erasure coded ec pools
The first step is to allocate rbd images correctly.
The metadata objects still need to be stored in a replicated pool, but
by providing the --data-pool parameter on image creation, we can place
the data objects on the erasure coded (EC) pool.
to allow reusing this with remote migration, where parsing of the source
volid has to happen on the source node, but this call has to happen on
the target node.
Fabian Ebner [Mon, 10 Jan 2022 11:50:44 +0000 (12:50 +0100)]
zfs: use -r parameter when listing snapshots
Some versions of ZFS do not automatically display the child snapshots
when '-t snapshot' is used, but require '-r' to be present
additionally[1]. And in general, it's cleaner to specify the flag
explicitly.
Because of that, commit ac5c1af led to a regression[0] in the context
of ZFS over iSCSI with zfs_get_sorted_snapshot_list. Fix it, by adding
a -r flag again.
The volume_snapshot_info function is currently only used in the
context of replication and that requires a local ZFS pool, but it
would be affected by the same issue if it is ever used in the context
of ZFS over iSCSI, so also add -r there.
Fabian Ebner [Fri, 5 Nov 2021 10:29:45 +0000 (11:29 +0100)]
lvm thin: don't assume that a thin pool and its volumes are active
There are cases where autoactivation can fail, as reported in the
community forum [0]. And it could also be that a volume was
deactivated by something outside of our control.
It doesn't seem strictly necessary to activate the thin pool itself
(creating/removing/activating LVs within the pool still works if it's
not active), but it does not report usage information as long as
neither the pool nor any of its LVs are active. Activate the pool for
that, for being able to use the flag in status(), and it should also
serve as a good indicator that there's a problem with the pool if it
can't be activated.
Before activating, check the (cached) lv_state from lvm_list_volumes.
It's necessary to update the cache in activate_storage, because the
flag is re-used in status(). Also update it for other (de)activations
to be more future-proof.
Fabian Ebner [Mon, 25 Oct 2021 13:47:49 +0000 (15:47 +0200)]
api: disks: delete: add flag for cleaning up storage config
Update node restrictions to reflect that the storage is not available
anymore on the particular node. If the storage was only configured for
that node, remove it altogether.
Fabian Ebner [Mon, 25 Oct 2021 13:47:47 +0000 (15:47 +0200)]
diskmanage: add helper for udev workaround
to avoid duplication. Current callers pass along at least one device,
but anticipate future callers that might call with the empty list. Do
nothing in that case, rather than triggering everything.
Aaron Lauterer [Tue, 9 Nov 2021 14:55:32 +0000 (15:55 +0100)]
add disk rename feature
Functionality has been added for the following storage types:
* directory ones, based on the default implementation:
* directory
* NFS
* CIFS
* gluster
* ZFS
* (thin) LVM
* Ceph
A new feature `rename` has been introduced to mark which storage
plugin supports the feature.
Version API and AGE have been bumped.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
the intention of this feature is to support the following use-cases:
- reassign a volume from one owning guest to another (which usually
entails a rename, since the owning vmid is encoded in the volume name)
- rename a volume (e.g., to use a more meaningful name instead of the
auto-assigned ...-disk-123)
only the former is implemented at the caller side in
qemu-server/pve-container for now, but since the lower-level feature is
basically the same for both, we can take advantage of the storage plugin
API bump now to get the building block for this future feature in place
already.
adapt ApiChangelog change to fix conflicts and added more detail above
prune: mark renamed and protected backups differently
While it makes no difference for pruning itself, protected backups are
additionally protected against removal. Avoid the potential to confuse
the two. Also update the description for the API return value and add
an enum constraint.
fix #3307: make it possible to set protection for backups
A protected backup is not removed by free_image and ignored when
pruning.
The protection_file_path function is introduced in Storage.pm, so that
it can also be used by vzdump itself and in archive_remove.
For pruning, renamed backups already behaved similiar to how protected
backups will, but there are a few reasons to not just use that for
implementing the new feature:
1. It wouldn't protect against removal.
2. It would make it necessary to rename notes and log files too.
3. It wouldn't naturally extend to other volumes if that's needed.
add generalized functions to manage volume attributes
replacing the ones for handling notes. To ensure backwards
compatibility with external plugins, all plugins that do not just call
another implementation need to call $class->{get, update}_volume_notes
when the attribute is 'notes' to catch any derived implementations.
This is mainly done to avoid the need to add new methods every time a
new attribute is added.
Not adding a timeout parameter like the notes functions have, because
it was not used and can still be added if it ever is needed in the
future.
For get_volume_attribute, undef will indicate that the attribute is
not supported. This makes it possible to distinguish "not supported"
from "error getting the attribute", which is useful when the attribute
is important for an operation. For example, free_image checking for
protection (introduced in a later patch) can abort if getting the
'protected' attribute fails.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
dir plugin: get notes: return undef if notes are not supported
This avoids showing empty notes in the result of the content/{volid}
API call for volumes that do not even support notes. It's also in
preparation for the proposed get_volume_attribute generalization,
which expects undef to be returned when an attribute is not supported.
Fabian Ebner [Thu, 12 Aug 2021 11:01:01 +0000 (13:01 +0200)]
zfspool: add blockers parameter to volume_snapshot_is_possible
useful for rollback, so that only the required replication snapshots
can be removed, and it's possible to abort early without deleting any
replication snapshots if there are other non-replication snasphots
blocking rollback.
Fabian Ebner [Wed, 6 Oct 2021 09:18:45 +0000 (11:18 +0200)]
partially fix #2285: api: disks: allow partitions for creation paths
The calls for directory and ZFS need slight adaptations. Except for
those, the only thing that needs to be done is support partitions in
the disk_is_used helper.
Fabian Ebner [Wed, 6 Oct 2021 09:18:44 +0000 (11:18 +0200)]
api: disks: initgpt: explicitly abort for partitions
In preparation to extend disk_is_used to support partitions. Without
this new check, initgpt would also allow partitions once disk_is_used
supports partitions, which is not desirable.
Fabian Ebner [Wed, 6 Oct 2021 09:18:43 +0000 (11:18 +0200)]
diskmanage: don't set usage for unused partitions
The disk type is already 'partition' so there's no additional
information here. And it would need to serve as a code-word for
unused partitions. The cleaner approach is to not set the usage.
Fabian Ebner [Wed, 6 Oct 2021 09:18:42 +0000 (11:18 +0200)]
diskmanage: wipe blockdev: also change partition type
when called with a partition. Since get_disks uses the partition type
(among other things) to detect LVM and ZFS volumes, such volumes would
still be seen as in-use after wiping. Thus, also change the partition
type and simply use 0x83 "Linux filesystem".
Fabian Ebner [Wed, 6 Oct 2021 09:18:41 +0000 (11:18 +0200)]
diskmanage: add change_parttype and is_partition helpers
For change_parttype, only GPT-partitioned disks are supported, as I
didn't see an option for sgdisk to make it also work with
MBR-partitioned disks. And while sfdisk could be used instead (or
additionally) it would be a new dependency, and AFAICS require some
conversion of partition type GUIDs to MBR types on our part.
commit e4d56f096ed28761d6b9a9e348be0fc682928040 removes a `sleep 1`
hack for removal of a tempfile which earlier happened in the
pve-http-server but we do ourself now.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this racey sleep(1) is only there for legacy reasons: because
we don't use apache anymore and only emulate its behabiour
regarding removing temp files, this is under our own control
now and so we can improve this whole situation.
this change requires a pve-http-server version, in which the
tmpfile gets not automatically removed anymore.
Requires that the $include_partitions parameter is set too, which:
1. Makes sense, because the partition won't be included in the result
otherwise.
2. Ensures backwards compatibility for existing callers that don't
use $include_partitions. No existing callers use both $disks and
$include_partitions at the same time, so nothing learns to
"support" partitions by accident.
Moving the strip_dev helper to the top, so it can be used everywhere.
api: disk: work around udev bug to ensure its database is updated
There is a udev bug [0] which can ultimately lead to the udev database
for certain devices not being actively updated. Determining whether a
disk is used or not in get_disks() (in part) relies upon lsblk, which
queries the udev database. Ensure the information is updated by
manually calling 'udevadm trigger' for the changed devices.
It's most important for the 'directory' API path, as mounting depends
on the '/dev/disk/by-uuid'-symlink to be generated.
Because then it might not be unused anymore. If there really is a
race, this prevents e.g. sgdisk creating a partition on a device
already in use by LVM or LVM destroying a partitioned device.
For ZFS, also get the latest udev info once inside the worker.
Currently, 'PVE::Storage::DirPlugin' is implicitly passed along as
$class, which means that if the base class's free_image calls another
method (e.g. filesystem_path) then the DirPlugin's method will be
used, rather than the one from BTRFSPlugin. Change it so that $class
itself is passed along.
Thomas Lamprecht [Tue, 14 Sep 2021 14:23:23 +0000 (16:23 +0200)]
cifs: negotiates the highest SMB2+ version supported by default
instead of hardcoding it to a potential outdated value.
For `smbclient` we only set max-protocol version and that could only
be smb2 or smb3 (no finer granularity) any how, so this was not
really correct.
Nowadays the kernel dropped SMB1 and tries to go for SMB2.1 or higher
by default, depending on what client and server supports. SMB2.1 is
Windows 7/2008R2 - both EOL since quite a bit, so ok as default lower
boundary.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Added support for the SMB version SMB3_11 When the `min protocol =
SMB3_11` in the smb.conf, the CIFS mount will return with the
following error:
```
CIFS VFS: cifs_mount failed w/return code = -95
```
added an optional option to use the `vers=3.11`
Signed-off-by: Moayad Almalat <m.almalat@proxmox.com> Tested-by: Fabian Ebner <f.ebner@proxmox.com>
[ Thomas: move text from cover letter to commit message &
add S-o-b ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Previously, top-level vdevs like log or special were wrongly added as
children of the previous outer vdev instead of the root.
Fix it by also showing the vdev with the same name as the pool and
start counting from level 1 (the pool itself serves as the root and
should be the only one with level 0). This results in the same kind
of structure as in PBS and (except for the root) zpool status itself.
Commit a000e26ce7d30ba2b938402164a9a15e66dd123e caused a test failure
in pve-manager, because now 'keep-all=0' is not thrown out upon
validation anymore. Fix the issue the commit addressed differently,
by simply creating a copy of the (shallow) hash first, and using
the logic from before the commit.
Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
While the current way to detect settings like { 'keep-last' => 0 } is
concise, it's also wrong, because but the delete operation is visible
to the caller. This resulted in e.g.
# $hash is { 'keep-all' => 1 }
my $s = print_property_string($hash, 'prune-backups');
# $hash is now {}, $s is 'keep-all=1'
because validation is called in print_property_string. The same issue
is present when calling prune_mark_backup_group.
Because validation complains when keep-all and something else is set,
this shouldn't have caused any real issues, besides vzdump with
keep-all=1 wrongly taking the removal path, but without any settings,
so not removing anything:
INFO: prune older backups with retention:
INFO: pruned 0 backup(s)