Stoiko Ivanov [Tue, 21 Mar 2023 11:12:40 +0000 (12:12 +0100)]
cifs: use empty string instead of / as default directory
this keeps the mount sources consistent with previous versions
without this patch there is a small regression, which leads to the
storage not being recognized as being mounted on upgrade:
* pvestatd in older version mount the storage with out trailing /
```
//cifsstore/ISO on /mnt/pve/cifsstore type cifs...
```
* the cifs_is_mounted helper does not recognize it as being mounted
(as the source now has a / in the end)
* attempting to mount leads to
```
mount error(16): Device or resource busy
```
noticed after upgrading and having a cifs storage mounted
Christian Ebner [Thu, 9 Mar 2023 09:41:23 +0000 (10:41 +0100)]
api: fix get content call response type for RBD/ZFS/iSCSI volumes
`pvesh get /nodes/{node}/storage/{storage}/content/{volume}` failed for
several storage types, because the respective storage plugins returned
only the volumes `size` on `volume_size_info` calls, while also the format
is required.
This patch fixes the issue by returning also `format` and where possible `used`.
The issue was reported in the forum:
https://forum.proxmox.com/threads/pvesh-get-nodes-node-storage-storage-content-volume-returns-error.123747/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
[ T: fixup white space error ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Leo Nunner [Thu, 1 Dec 2022 11:32:55 +0000 (12:32 +0100)]
fix #2641: allow mounting of CIFS subdirectories
CIFS/SMB supports directly mounting subdirectories, so it makes sense to
also allow the --subdir parameter for these storages. The subdir
parameter was moved from CephFSPlugin.pm to Plugin.pm, because it isn't
specific to CephFS anymore.
Fiona Ebner [Mon, 23 Jan 2023 09:14:50 +0000 (10:14 +0100)]
nfs: check connection: support NFSv4-only servers without rpcbind
by simply doing a ping with the expected port as a fallback when the
rpcinfo command fails.
The timeout was chosen to be 2 seconds, because that's what the
existing callers of tcp_ping() in the iSCSI and GlusterFS plugins use.
Alternatively, the existing check could be replaced, but that would
1. Dumb down the check.
2. Risk breakage in some corner case that's yet to be discovered.
3. It would still be necessary to use rpcinfo (or dumb the check down
even further) in case port=0; from 'man 5 nfs' about the NFSv4 'port'
option:
> If the specified port value is 0, then the NFS client uses the NFS
> service port number advertised by the server's rpcbind service.
Reported in the community forum:
https://forum.proxmox.com/threads/118466/post-524449
https://forum.proxmox.com/threads/120774/
Fiona Ebner [Fri, 9 Dec 2022 10:30:41 +0000 (11:30 +0100)]
fix #4390: rbd: snapshot delete: avoid early return to fix handling TPM drive
The only caller where $running can even be truthy is QemuServer.pm's
qemu_volume_snapshot_delete(). But there, a check if snapshots should
be done with QEMU is already made and the storage function is only
called if snapshots should not be done with QEMU (like for TPM drives
which are not attached directly). So rely on the caller and do not
return early to fix removing snapshots in such cases.
Even if a stray call ends up here (can happen already by changing the
krbd setting while a VM is running to flip the above-mentioned check
and the early return check removed by this patch), it might not even
be problematic. At least a quick test worked fine:
1. take snapshot via a monitor command in QEMU
2. remove snapshot via the storage layer
3. create a new file in the VM
4. take a snapshot with the same name via monitor command in QEMU
5. roll back to the snapshot
6. check that the file in the VM is as expected
Using the storage layer to take the snapshots and QEMU to remove the
snapshot also worked doing the same test. Even if it were problematic,
the check in qemu-server should rather be improved then.
(Trying to issue a snapshot mon command for a krbd-mapped image fails
with an error on the other hand, but that is also not too bad and not
relevant to the storage code. Again, it rather would be necessary to
improve the check in qemu-server).
The fact that the pve-container code didn't even pass $running is the
reason removing snapshots worked for containers on a storage with krbd
disabled (the pve-container code calls map_volume() explicitly, so
containers can work regardless of the krbd setting in the storage
configuration; see commit 841fba6 ("call map_volume before using
volumes.") in pve-container).
For volume_snapshot(), the early return with $running was already
removed (or rather the relevant logic moved to QemuServer.pm) in 2015
by commit f5640e7 ("remove running from Storage and check it in
QemuServer"), even before krbd support was added in RBDPlugin.pm.
Fiona Ebner [Tue, 10 Jan 2023 12:52:43 +0000 (13:52 +0100)]
zfs: list zvol: limit recursion depth to 1
To be correct in all cases, it's still necessary to filter by "pool"
in zfs_parse_zvol_list(), because $scfg->{pool} could be e.g.
'foo/vm-123-disk-0' which looks like an image name and would pass the
other "should skip"-checks in zfs_parse_zvol_list().
No change in the result of zfs_list_zvol() is intended.
Leo Nunner [Thu, 5 Jan 2023 16:16:57 +0000 (17:16 +0100)]
plugin: change name, separator and error message for dir overrides
Rename the "dirs" parameter to "content-dirs". Switch from a "vtype:/dir"
format to "vtype=/dir", and remove the misleading error message talking
about "absolute" paths. One might expect these to be absolute over the
whole system, while in reality they are relative to the mountpoint of
the storage.
Leo Nunner [Mon, 2 Jan 2023 16:04:37 +0000 (17:04 +0100)]
config: add overrides for default directory locations
Allowing overrides for the default directory locations seems to
integrate rather well into the existing system. Custom locations
are specified using the "dirs" parameter as a comma-separated list
of "vtype:/location" values.
For now, the option has been enabled for the Directory, CIFS and NFS
backends.
Fiona Ebner [Tue, 20 Dec 2022 13:16:36 +0000 (14:16 +0100)]
zfs: list: only cache and list images for requested storage/pool
The plugin for remote ZFS storages currently also uses the same
list_images() as the plugin for local ZFS storages. There is only
one cache which does not remember the target host where the
information originated.
This is problematic for rescan in qemu-server:
1. Via list_images() and zfs_list_zvol(), ZFSPlugin.pm's zfs_request()
is executed for a remote ZFS.
2. $cache->{zfs} is set to the result of scanning the images there.
3. Another list_images() for a local ZFS storage happens and uses the
cache with the wrong information.
The only two operations using the cache currently are
1. Disk rescan in qemu-server which is affected by the issue. It is
done as part of (or as a) long-running operations.
2. prepare_local_job for pvesr, but the non-local ZFS plugin doesn't
support replication, so it cannot trigger there. The cache only
helps if there is a replicated guest with volumes on different
ZFS storages, but otherwise it will be less efficient than no
cache or querying every storage by itself.
Fix the issue by making the cache $storeid-specific, which effectively
makes the cache superfluous, because there is no caller using the same
$storeid multiple times. As argued above, this should not really hurt
the callers described above much and actually improves performance for
all other callers.
Fiona Ebner [Mon, 12 Dec 2022 12:33:09 +0000 (13:33 +0100)]
disk manage: pass full NVMe device path to smartctl
This essentially reverts commit c9bd3d2 ("fix #1123: modify NVME
device path for SMART support").
The man page for smartctl states
> Use the forms "/dev/nvme[0-9]" (broadcast namespace) or
> "/dev/nvme[0-9]n[1-9]" (specific namespace 1-9) for NVMe devices.
so it should be fine to pass the path with the specific namespace to
smartctl.
But that text was already present in the man page of version 6.5,
which is the version the commit c9bd3d2 talks about. It might be that
it was necessary to drop the specific namespace for the version
backported from Stretch to Jessie (the bug report mentions that that
version was used[0]), but it's not quite clear.
With current versions, passing in the path with the specific namespace
did work as expected[1], even on a device with multiple namespaces set
up tested locally. In PBS, the path queried via
udev::Device::from_syspath("/sys/block/{name}") is passed to smartctl
and that also included the specific namespace on the systems I tested
with a short script.
So pass the full path to make things a little bit simpler and to avoid
potential future issues like bug #2020[2].
Nowadays, relying on 'readlink /sys/block/nvmeXnY/device' won't always
lead to the correct device, as reported in the community forum[0],
where it results in '../../nvme-subsys0' and there's no matching entry
under '/dev/'.
Since Linux kernel 5.4, in particular commit 733e4b69d508 ("nvme:
Assign subsys instance from first ctrl"), the problematic situation
from bug #2020 shouldn't happen anymore.
Stated more clearly by the commit's author here[1]:
> Indeed, that commit will make the naming a bit more sane and will
> definitely prevent mistaken identity. It is still possible to
> observe controllers with instances that don't match their
> namespaces, but it is impossible to get a namespace instance that
> matches a non-owning controller.
The only other user of get_sysdir_info() doesn't use the 'device'
entry, so reverting that part is fine too.
Fiona Ebner [Wed, 23 Nov 2022 11:40:25 +0000 (12:40 +0100)]
get bandwidth limit: improve detecting if storages are involved
Previously, calling with e.g. $storage_list = [undef] would lead to an
early return of $override and not consider the limit from
datacenter.cfg.
Refactoring the bandwidth limit handling for migration introduced
calls such as described above, which broke applying the limit from
datacenter.cfg for VM RAM/state migration.
Reported in the community forum:
https://forum.proxmox.com/threads/37920/post-513005
Dominik Csapak [Thu, 10 Nov 2022 10:36:33 +0000 (11:36 +0100)]
api: FileRestore: make use of file-restores and guis timeout mechanism
file-restore has a 'timeout' parameter and if that is exceeded, returns
an error with the http code 503 Service Unavailable
When the web ui encounters such an error, it retries the listing a few
times before giving up.
To make use of these, add the 'timeout' parameter to the new
'extraParams' of 'file_restore_list'.
25 seconds are chosen because it's under pveproxy 30s limit, with a bit
of overhead to spare for the rest of the api call, like json decoding,
forking, access control checks, etc.
Dominik Csapak [Thu, 10 Nov 2022 10:36:32 +0000 (11:36 +0100)]
api: FileRestore: decode and return proper error of file-restore listing
since commit ba690c40 ("file-restore: remove 'json-error' parameter from list_files")
in proxmox-backup, the file-restore binary will return the error as json
when called with '--output-format json' (which we do in PVE::PBSClient)
here, we assume that 'file-restore' will fail in that case, and we try
to use the return value as an array ref which fails, and the user never
sees the real error message.
To fix that, check the ref type of the return value and act accordingly
Thomas Lamprecht [Fri, 11 Nov 2022 08:22:19 +0000 (09:22 +0100)]
disk manage: move "draid-config set only on draid level" assertion
so that there is a better code locality and also we avoid forgetting
to adapt the check for each specific draid-config parameter if a new
one gets added or an existing one changed.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
pbs: prune: avoid getting all snapshots for group assembly if fixed anyway
If both type and vmid is defined we don't need to list the current
snapshots, we simply can derive the single backup group from that and
let the PBS client handle the rest.
Should be a not so small speedup for most setups using PBS backup and
pruning configured on PVE side, as vzdump calls this separately for
every vmid on backup jobs with multiple guests included.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this format comes from the remote cluster, so it might not be supported
on the source side - checking whether it's known (as additional
safeguard) and untainting (to avoid open3 failure) is required.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ T: squashed in canonical perl array ref access ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 23 Sep 2022 09:54:41 +0000 (11:54 +0200)]
disk manage: module wide code/style cleanup
fixing some issues reported by perlcritic along the way.
cutting down 70 lines, often with even improving readability.
Tried to recheck and be conservative, so there shouldn't be any
regression, but it's still perl after all...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Aaron Lauterer [Fri, 19 Aug 2022 15:01:21 +0000 (17:01 +0200)]
disks: allow add_storage for already configured local storage
One of the smaller annoyances, especially for less experienced users, is
the fact, that when creating a local storage (ZFS, LVM (thin), dir) in a
cluster, one can only leave the "Add Storage" option enabled the first
time.
On any following node, this option needed to be disabled and the new
node manually added to the list of nodes for that storage.
This patch changes the behavior. If a storage of the same name already
exists, it will verify that necessary parameters match the already
existing one.
Then, if the 'nodes' parameter is set, it adds the current node and
updates the storage config.
In case there is no nodes list, nothing else needs to be done, and the
GUI will stop showing the question mark for the configured, but until
then, not existing local storage.
Aaron Lauterer [Fri, 19 Aug 2022 15:01:20 +0000 (17:01 +0200)]
disks: die if storage name is already in use
If a storage of that type and name already exists (LVM, zpool, ...) but
we do not have a Proxmox VE Storage config for it, it is possible that
the creation will fail midway due to checks done by the underlying
storage layer itself. This in turn can lead to disks that are already
partitioned. Users would need to clean this up themselves.
By adding checks early on, not only checking against the PVE storage
config, but against the actual storage type itself, we can die early
enough, before we touch any disk.
For ZFS, the logic to gather pool data is moved into its own function to
be called from the index API endpoint and the check in the create
endpoint.
RBD plugin: librados connect: increase timeout when in worker
The default timeout in PVE/RADOS.pm is 5 seconds, but this is not
always enough for external clusters under load. Workers can and should
take their time to not fail here too quickly.
The return value of get_rbd_dev_path() is only used when $scfg->{krbd}
evaluates to true and the function shouldn't have any side effects
that are needed later, so the call can be avoided otherwise.
This also saves a RADOS connection and command with configurations for
external clusters with krbd disabled.
fix #4189: pbs: bump list_volumes timeout to 2mins
When switching this from calling the external binary to
using the perl api client the timeout got reduced to 7
seconds, which is definitely insufficient for larger stores.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
pbs: detect mismatch of encryption settings and key
if the key file doesn't exist (anymore), but the storage.cfg references
one, die on commands that should use encryption instead of falling back
to plain-text operations.
Before af07f67 ("pbs: use vmid parameter in list_snapshots") the
namespace was set via do_raw_client_command, but now it needs to be
set explicitly here.
Fixes: af07f67 ("pbs: use vmid parameter in list_snapshots") Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Added a LOG_EXT constant as a counterpart to NOTES_EXT
and refactored usages for .log and .notes with them.
At some parts in the test case code I had to source new variables to
shorten the line length to not exceed the 100 column line limit.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Adapted unlink calls for archive files in case of ENOENT
This improves handling when two archive remove calls are creating a
race condition where one would formerly encounter an error. Now both
finish successfully.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fix #3972: Remove the .notes file when a backup is deleted
When a VM or Container backup was deleted, the .notes file was not
removed, therefore, over time the dump folder would get polluted with
notes for backups that no longer existed. As backup names contain a
timestamp and as the notes cannot be reused because of this, I think
it is safe to just delete them just like we do with the .log file.
Furthermore, I sourced the deletion of the log and notes file into a
new function called "archive_auxiliaries_remove". Additionally, the
archive_info object now returns one more field containing the name of
the notes file. The test cases have to be adapted to expect this new
value as the package will not compile otherwise.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Aaron Lauterer [Mon, 23 May 2022 10:54:25 +0000 (12:54 +0200)]
rbd: get_rbd_dev_path: return /dev/rbd path only if cluster matches
The changes in cfe46e2d4a97a83f1bbe6ad656e6416399309ba2 git not catch
all situations.
In the case of a guest having 2 disk images with the same name on a pool
with the same name but in two different ceph clusters we still had
issues when starting it. The first disk got mapped as expected. The
second disk did not get mapped because we returned the old $path to
"/dev/rbd/<pool>/<image>" because it already existed from the first
disk.
In the case that only the "old" /dev/rbd path exists and we do not have
the /dev/rbd-pve/<cluster>/... path available, we now check if the
cluster fsid used by that rbd device matches the one we expect. If it
does, then we are in the situation that the image has been mapped before
the new rbd-pve udev rule was introduced. If it does not, then we have
the situation of an ambiguous mapping in /dev/rbd and return the
$pve_path.
Aaron Lauterer [Wed, 18 May 2022 09:04:54 +0000 (11:04 +0200)]
rbd: fix #4060 show data-pool usage when configured
When a data-pool is configured, use it for status infos. The 'data-pool'
config option is used to mark the erasure coded pool while the 'pool'
will be the replicated pool holding meta data such as the omap.
This means, the 'pool' will only use a small amount of space and people
are interested how much they can store in the erasure coded pool anyway.
Therefore this patch reorders the assignment of the used pool name by
availability of the scfg parameters: data-pool -> pool -> fallback 'rbd'
Stoiko Ivanov [Tue, 3 May 2022 11:31:40 +0000 (13:31 +0200)]
rbd: warn if no stats for a pool could be gathered
happens in case of a mistyped poolname, and the new message should be
more helpful than:
`Use of uninitialized value $free in addition (+) at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 64`
Stoiko Ivanov [Tue, 3 May 2022 11:31:39 +0000 (13:31 +0200)]
rbd: add fallback default poolname 'rbd' to status
the fallback to a default pool name of 'rbd' was introduced in: 1440604a4b072b88cc1e4f8bbae4511b50d1d68e
and worked for the status command, because it used the `rados_cmd`
sub.
leading to confusing errors:
`Use of uninitialized value in string eq at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 633`
(e.g. in the journal from pvestatd)
Thomas Lamprecht [Thu, 28 Apr 2022 16:17:56 +0000 (18:17 +0200)]
rbd: get path: allow fake override of fsid in scfg for some regression tests
to avoid calls into RADOS connect, that trigger RPCEnv not
initialized breakage in regression tests, but wouldn't really work
otherwise either
in the future the RBD $scfg could actually support this (or similarly
named) property, to safe on storage addition and then avoid frequent
mon commands
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When krbd is used, subsequent removal after an an operation
involving a rename could fail with
> librbd::image::PreRemoveRequest: 0x559b7506a470 \
> check_image_watchers: image has watchers - not removing
because the old mapping was still present.
For both operations with a rename, the owning guest should be offline,
but even if it weren't, unmap simply fails when the volume is in-use.
rbd: fix #3969: add rbd dev paths with cluster info
By adding our own customized rbd udev rules and ceph-rbdnamer we can
create device paths that include the cluster fsid and avoid any
ambiguity if the same pool and namespace combination is used in
different clusters we connect to.
Additionally to the '/dev/rbd/<pool>/...' paths we now have
'/dev/rbd-pve/<cluster fsid>/<pool>/...' paths.
The other half of the patch makes use of the new device paths in the RBD
plugin.
The new 'get_rbd_dev_path' method the full device path. In case that the
image has been mapped before the rbd-pve udev rule has been installed,
it returns the old path.
The cluster fsid is read from the 'ceph.conf' file in the case of a
hyperconverged setup. In the case of an external Ceph cluster we need to
fetch it via a rados api call.
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Dominik Csapak [Wed, 9 Mar 2022 08:21:28 +0000 (09:21 +0100)]
storage plugins: en/decode volume notes as UTF-8
When writing into the file, explicitly utf8 encode it, and then try
to utf8 decode it on read.
If the notes are not valid utf8, we assume they were iso-8859 encoded
and return as is.
Technically this is a breaking change, since there are iso-8859
comments that would successfully decode as utf8, for example: the
byte sequence "C2 A9" would be "£" in iso, but would decode to "£".
From what i can tell though, this is rather unlikely to happen for
"real world" notes, because the first byte would be in the range of
C0-F7 (which are mostly language dependent characters like "Â") and
the following bytes would have to be in the range of 80-BF, which are
only special characters like "£" (or undefined)
Dominik Csapak [Thu, 23 Dec 2021 12:06:22 +0000 (13:06 +0100)]
fix #3803: ZFSPoolPlugin: zfs_request: increase minimum timeout in worker
Since most zfs operations can take a while (under certain conditions),
increase the minimum timeout for zfs_request in workers to 5 minutes.
We cannot increase the timeouts in synchronous api calls, since they are
hard limited to 30 seconds, but in worker we do not have such limits.
The existing default timeout does not change (60minutes in worker,
5seconds otherwise), but all zfs_requests with a set timeout (<5minutes)
will use the increased 5 minutes in a worker.
Fabian Ebner [Tue, 29 Mar 2022 12:53:13 +0000 (14:53 +0200)]
plugins: allow limiting the number of protected backups per guest
The ability to mark backups as protected broke the implicit assumption
in vzdump that remove=1 and current number of backups being the limit
(i.e. sum of all keep options) will result in a backup being removed.
Introduce a new storage property 'max-protected-backups' to limit the
number of protected backups per guest. Use 5 as a default value, as it
should cover most use cases, while still not having too big of a
potential overhead in many scenarios.
For external plugins that do not return the backup subtype in
list_volumes, all protected backups with the same ID will count
towards the limit.
An alternative would be to count the protected backups when pruning.
While that would avoid the need for a new property, it would break the
current semantics of protected backups being ignored for pruning. It
also would be less flexible, e.g. for PBS, it can make sense to have
both keep-all=1 and a limit for the number of protected snapshots on
the PVE side.