pbs: prune: avoid getting all snapshots for group assembly if fixed anyway
If both type and vmid is defined we don't need to list the current
snapshots, we simply can derive the single backup group from that and
let the PBS client handle the rest.
Should be a not so small speedup for most setups using PBS backup and
pruning configured on PVE side, as vzdump calls this separately for
every vmid on backup jobs with multiple guests included.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this format comes from the remote cluster, so it might not be supported
on the source side - checking whether it's known (as additional
safeguard) and untainting (to avoid open3 failure) is required.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ T: squashed in canonical perl array ref access ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 23 Sep 2022 09:54:41 +0000 (11:54 +0200)]
disk manage: module wide code/style cleanup
fixing some issues reported by perlcritic along the way.
cutting down 70 lines, often with even improving readability.
Tried to recheck and be conservative, so there shouldn't be any
regression, but it's still perl after all...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Aaron Lauterer [Fri, 19 Aug 2022 15:01:21 +0000 (17:01 +0200)]
disks: allow add_storage for already configured local storage
One of the smaller annoyances, especially for less experienced users, is
the fact, that when creating a local storage (ZFS, LVM (thin), dir) in a
cluster, one can only leave the "Add Storage" option enabled the first
time.
On any following node, this option needed to be disabled and the new
node manually added to the list of nodes for that storage.
This patch changes the behavior. If a storage of the same name already
exists, it will verify that necessary parameters match the already
existing one.
Then, if the 'nodes' parameter is set, it adds the current node and
updates the storage config.
In case there is no nodes list, nothing else needs to be done, and the
GUI will stop showing the question mark for the configured, but until
then, not existing local storage.
Aaron Lauterer [Fri, 19 Aug 2022 15:01:20 +0000 (17:01 +0200)]
disks: die if storage name is already in use
If a storage of that type and name already exists (LVM, zpool, ...) but
we do not have a Proxmox VE Storage config for it, it is possible that
the creation will fail midway due to checks done by the underlying
storage layer itself. This in turn can lead to disks that are already
partitioned. Users would need to clean this up themselves.
By adding checks early on, not only checking against the PVE storage
config, but against the actual storage type itself, we can die early
enough, before we touch any disk.
For ZFS, the logic to gather pool data is moved into its own function to
be called from the index API endpoint and the check in the create
endpoint.
RBD plugin: librados connect: increase timeout when in worker
The default timeout in PVE/RADOS.pm is 5 seconds, but this is not
always enough for external clusters under load. Workers can and should
take their time to not fail here too quickly.
The return value of get_rbd_dev_path() is only used when $scfg->{krbd}
evaluates to true and the function shouldn't have any side effects
that are needed later, so the call can be avoided otherwise.
This also saves a RADOS connection and command with configurations for
external clusters with krbd disabled.
fix #4189: pbs: bump list_volumes timeout to 2mins
When switching this from calling the external binary to
using the perl api client the timeout got reduced to 7
seconds, which is definitely insufficient for larger stores.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
pbs: detect mismatch of encryption settings and key
if the key file doesn't exist (anymore), but the storage.cfg references
one, die on commands that should use encryption instead of falling back
to plain-text operations.
Before af07f67 ("pbs: use vmid parameter in list_snapshots") the
namespace was set via do_raw_client_command, but now it needs to be
set explicitly here.
Fixes: af07f67 ("pbs: use vmid parameter in list_snapshots") Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Added a LOG_EXT constant as a counterpart to NOTES_EXT
and refactored usages for .log and .notes with them.
At some parts in the test case code I had to source new variables to
shorten the line length to not exceed the 100 column line limit.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Adapted unlink calls for archive files in case of ENOENT
This improves handling when two archive remove calls are creating a
race condition where one would formerly encounter an error. Now both
finish successfully.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fix #3972: Remove the .notes file when a backup is deleted
When a VM or Container backup was deleted, the .notes file was not
removed, therefore, over time the dump folder would get polluted with
notes for backups that no longer existed. As backup names contain a
timestamp and as the notes cannot be reused because of this, I think
it is safe to just delete them just like we do with the .log file.
Furthermore, I sourced the deletion of the log and notes file into a
new function called "archive_auxiliaries_remove". Additionally, the
archive_info object now returns one more field containing the name of
the notes file. The test cases have to be adapted to expect this new
value as the package will not compile otherwise.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Aaron Lauterer [Mon, 23 May 2022 10:54:25 +0000 (12:54 +0200)]
rbd: get_rbd_dev_path: return /dev/rbd path only if cluster matches
The changes in cfe46e2d4a97a83f1bbe6ad656e6416399309ba2 git not catch
all situations.
In the case of a guest having 2 disk images with the same name on a pool
with the same name but in two different ceph clusters we still had
issues when starting it. The first disk got mapped as expected. The
second disk did not get mapped because we returned the old $path to
"/dev/rbd/<pool>/<image>" because it already existed from the first
disk.
In the case that only the "old" /dev/rbd path exists and we do not have
the /dev/rbd-pve/<cluster>/... path available, we now check if the
cluster fsid used by that rbd device matches the one we expect. If it
does, then we are in the situation that the image has been mapped before
the new rbd-pve udev rule was introduced. If it does not, then we have
the situation of an ambiguous mapping in /dev/rbd and return the
$pve_path.
Aaron Lauterer [Wed, 18 May 2022 09:04:54 +0000 (11:04 +0200)]
rbd: fix #4060 show data-pool usage when configured
When a data-pool is configured, use it for status infos. The 'data-pool'
config option is used to mark the erasure coded pool while the 'pool'
will be the replicated pool holding meta data such as the omap.
This means, the 'pool' will only use a small amount of space and people
are interested how much they can store in the erasure coded pool anyway.
Therefore this patch reorders the assignment of the used pool name by
availability of the scfg parameters: data-pool -> pool -> fallback 'rbd'
Stoiko Ivanov [Tue, 3 May 2022 11:31:40 +0000 (13:31 +0200)]
rbd: warn if no stats for a pool could be gathered
happens in case of a mistyped poolname, and the new message should be
more helpful than:
`Use of uninitialized value $free in addition (+) at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 64`
Stoiko Ivanov [Tue, 3 May 2022 11:31:39 +0000 (13:31 +0200)]
rbd: add fallback default poolname 'rbd' to status
the fallback to a default pool name of 'rbd' was introduced in: 1440604a4b072b88cc1e4f8bbae4511b50d1d68e
and worked for the status command, because it used the `rados_cmd`
sub.
leading to confusing errors:
`Use of uninitialized value in string eq at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 633`
(e.g. in the journal from pvestatd)
Thomas Lamprecht [Thu, 28 Apr 2022 16:17:56 +0000 (18:17 +0200)]
rbd: get path: allow fake override of fsid in scfg for some regression tests
to avoid calls into RADOS connect, that trigger RPCEnv not
initialized breakage in regression tests, but wouldn't really work
otherwise either
in the future the RBD $scfg could actually support this (or similarly
named) property, to safe on storage addition and then avoid frequent
mon commands
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When krbd is used, subsequent removal after an an operation
involving a rename could fail with
> librbd::image::PreRemoveRequest: 0x559b7506a470 \
> check_image_watchers: image has watchers - not removing
because the old mapping was still present.
For both operations with a rename, the owning guest should be offline,
but even if it weren't, unmap simply fails when the volume is in-use.
rbd: fix #3969: add rbd dev paths with cluster info
By adding our own customized rbd udev rules and ceph-rbdnamer we can
create device paths that include the cluster fsid and avoid any
ambiguity if the same pool and namespace combination is used in
different clusters we connect to.
Additionally to the '/dev/rbd/<pool>/...' paths we now have
'/dev/rbd-pve/<cluster fsid>/<pool>/...' paths.
The other half of the patch makes use of the new device paths in the RBD
plugin.
The new 'get_rbd_dev_path' method the full device path. In case that the
image has been mapped before the rbd-pve udev rule has been installed,
it returns the old path.
The cluster fsid is read from the 'ceph.conf' file in the case of a
hyperconverged setup. In the case of an external Ceph cluster we need to
fetch it via a rados api call.
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Dominik Csapak [Wed, 9 Mar 2022 08:21:28 +0000 (09:21 +0100)]
storage plugins: en/decode volume notes as UTF-8
When writing into the file, explicitly utf8 encode it, and then try
to utf8 decode it on read.
If the notes are not valid utf8, we assume they were iso-8859 encoded
and return as is.
Technically this is a breaking change, since there are iso-8859
comments that would successfully decode as utf8, for example: the
byte sequence "C2 A9" would be "£" in iso, but would decode to "£".
From what i can tell though, this is rather unlikely to happen for
"real world" notes, because the first byte would be in the range of
C0-F7 (which are mostly language dependent characters like "Â") and
the following bytes would have to be in the range of 80-BF, which are
only special characters like "£" (or undefined)
Dominik Csapak [Thu, 23 Dec 2021 12:06:22 +0000 (13:06 +0100)]
fix #3803: ZFSPoolPlugin: zfs_request: increase minimum timeout in worker
Since most zfs operations can take a while (under certain conditions),
increase the minimum timeout for zfs_request in workers to 5 minutes.
We cannot increase the timeouts in synchronous api calls, since they are
hard limited to 30 seconds, but in worker we do not have such limits.
The existing default timeout does not change (60minutes in worker,
5seconds otherwise), but all zfs_requests with a set timeout (<5minutes)
will use the increased 5 minutes in a worker.
Fabian Ebner [Tue, 29 Mar 2022 12:53:13 +0000 (14:53 +0200)]
plugins: allow limiting the number of protected backups per guest
The ability to mark backups as protected broke the implicit assumption
in vzdump that remove=1 and current number of backups being the limit
(i.e. sum of all keep options) will result in a backup being removed.
Introduce a new storage property 'max-protected-backups' to limit the
number of protected backups per guest. Use 5 as a default value, as it
should cover most use cases, while still not having too big of a
potential overhead in many scenarios.
For external plugins that do not return the backup subtype in
list_volumes, all protected backups with the same ID will count
towards the limit.
An alternative would be to count the protected backups when pruning.
While that would avoid the need for a new property, it would break the
current semantics of protected backups being ignored for pruning. It
also would be less flexible, e.g. for PBS, it can make sense to have
both keep-all=1 and a limit for the number of protected snapshots on
the PVE side.
Fabian Ebner [Wed, 30 Mar 2022 10:24:28 +0000 (12:24 +0200)]
pvesm: extract config: check for VM.Backup privilege
In preparation to have check_volume_access() always allow access for
users with Datastore.Allocate privilege. As to not automatically give
all such users permission to extract the config too.
Fabian Ebner [Mon, 15 Nov 2021 12:37:56 +0000 (13:37 +0100)]
cifs: check connection: bubble up NT_STATUS_LOGON_FAILURE
in the same manner as NT_STATUS_ACCESS_DENIED. It can be assumed to be
a configuration error, so avoid showing the generic "storage <storeid>
is not online". Reported in the community forum:
https://forum.proxmox.com/threads/storage-is-not-online-cifs.99201/post-428858
Mira Limbeck [Fri, 18 Feb 2022 08:58:27 +0000 (09:58 +0100)]
file_size_info: cast 'size' and 'used' to integer
`qemu-img info --output=json` returns the size and used values as integers in
the JSON format, but the regex match converts them to strings.
As we know they only contain digits, we can simply cast them back to integers
after the regex.
The API requires them to be integers.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
Aaron Lauterer [Fri, 28 Jan 2022 11:22:41 +0000 (12:22 +0100)]
fix #1816: rbd: add support for erasure coded ec pools
The first step is to allocate rbd images correctly.
The metadata objects still need to be stored in a replicated pool, but
by providing the --data-pool parameter on image creation, we can place
the data objects on the erasure coded (EC) pool.
to allow reusing this with remote migration, where parsing of the source
volid has to happen on the source node, but this call has to happen on
the target node.
Fabian Ebner [Mon, 10 Jan 2022 11:50:44 +0000 (12:50 +0100)]
zfs: use -r parameter when listing snapshots
Some versions of ZFS do not automatically display the child snapshots
when '-t snapshot' is used, but require '-r' to be present
additionally[1]. And in general, it's cleaner to specify the flag
explicitly.
Because of that, commit ac5c1af led to a regression[0] in the context
of ZFS over iSCSI with zfs_get_sorted_snapshot_list. Fix it, by adding
a -r flag again.
The volume_snapshot_info function is currently only used in the
context of replication and that requires a local ZFS pool, but it
would be affected by the same issue if it is ever used in the context
of ZFS over iSCSI, so also add -r there.
Fabian Ebner [Fri, 5 Nov 2021 10:29:45 +0000 (11:29 +0100)]
lvm thin: don't assume that a thin pool and its volumes are active
There are cases where autoactivation can fail, as reported in the
community forum [0]. And it could also be that a volume was
deactivated by something outside of our control.
It doesn't seem strictly necessary to activate the thin pool itself
(creating/removing/activating LVs within the pool still works if it's
not active), but it does not report usage information as long as
neither the pool nor any of its LVs are active. Activate the pool for
that, for being able to use the flag in status(), and it should also
serve as a good indicator that there's a problem with the pool if it
can't be activated.
Before activating, check the (cached) lv_state from lvm_list_volumes.
It's necessary to update the cache in activate_storage, because the
flag is re-used in status(). Also update it for other (de)activations
to be more future-proof.
Fabian Ebner [Mon, 25 Oct 2021 13:47:49 +0000 (15:47 +0200)]
api: disks: delete: add flag for cleaning up storage config
Update node restrictions to reflect that the storage is not available
anymore on the particular node. If the storage was only configured for
that node, remove it altogether.
Fabian Ebner [Mon, 25 Oct 2021 13:47:47 +0000 (15:47 +0200)]
diskmanage: add helper for udev workaround
to avoid duplication. Current callers pass along at least one device,
but anticipate future callers that might call with the empty list. Do
nothing in that case, rather than triggering everything.
Aaron Lauterer [Tue, 9 Nov 2021 14:55:32 +0000 (15:55 +0100)]
add disk rename feature
Functionality has been added for the following storage types:
* directory ones, based on the default implementation:
* directory
* NFS
* CIFS
* gluster
* ZFS
* (thin) LVM
* Ceph
A new feature `rename` has been introduced to mark which storage
plugin supports the feature.
Version API and AGE have been bumped.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
the intention of this feature is to support the following use-cases:
- reassign a volume from one owning guest to another (which usually
entails a rename, since the owning vmid is encoded in the volume name)
- rename a volume (e.g., to use a more meaningful name instead of the
auto-assigned ...-disk-123)
only the former is implemented at the caller side in
qemu-server/pve-container for now, but since the lower-level feature is
basically the same for both, we can take advantage of the storage plugin
API bump now to get the building block for this future feature in place
already.
adapt ApiChangelog change to fix conflicts and added more detail above