Friedrich Weber [Thu, 4 May 2023 09:13:34 +0000 (11:13 +0200)]
api: upload: add pattern to tmpfilename parameter
The `tmpfilename` is generated by pve-http-server and always adheres
to this pattern, so make this explicit to prevent confusion and have
a more complete JSON API schema, useful for e.g., the API viewer.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
[ T: slightly extended commit message ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stoiko Ivanov [Tue, 21 Mar 2023 11:12:40 +0000 (12:12 +0100)]
cifs: use empty string instead of / as default directory
this keeps the mount sources consistent with previous versions
without this patch there is a small regression, which leads to the
storage not being recognized as being mounted on upgrade:
* pvestatd in older version mount the storage with out trailing /
```
//cifsstore/ISO on /mnt/pve/cifsstore type cifs...
```
* the cifs_is_mounted helper does not recognize it as being mounted
(as the source now has a / in the end)
* attempting to mount leads to
```
mount error(16): Device or resource busy
```
noticed after upgrading and having a cifs storage mounted
Christian Ebner [Thu, 9 Mar 2023 09:41:23 +0000 (10:41 +0100)]
api: fix get content call response type for RBD/ZFS/iSCSI volumes
`pvesh get /nodes/{node}/storage/{storage}/content/{volume}` failed for
several storage types, because the respective storage plugins returned
only the volumes `size` on `volume_size_info` calls, while also the format
is required.
This patch fixes the issue by returning also `format` and where possible `used`.
The issue was reported in the forum:
https://forum.proxmox.com/threads/pvesh-get-nodes-node-storage-storage-content-volume-returns-error.123747/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
[ T: fixup white space error ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Leo Nunner [Thu, 1 Dec 2022 11:32:55 +0000 (12:32 +0100)]
fix #2641: allow mounting of CIFS subdirectories
CIFS/SMB supports directly mounting subdirectories, so it makes sense to
also allow the --subdir parameter for these storages. The subdir
parameter was moved from CephFSPlugin.pm to Plugin.pm, because it isn't
specific to CephFS anymore.
Fiona Ebner [Mon, 23 Jan 2023 09:14:50 +0000 (10:14 +0100)]
nfs: check connection: support NFSv4-only servers without rpcbind
by simply doing a ping with the expected port as a fallback when the
rpcinfo command fails.
The timeout was chosen to be 2 seconds, because that's what the
existing callers of tcp_ping() in the iSCSI and GlusterFS plugins use.
Alternatively, the existing check could be replaced, but that would
1. Dumb down the check.
2. Risk breakage in some corner case that's yet to be discovered.
3. It would still be necessary to use rpcinfo (or dumb the check down
even further) in case port=0; from 'man 5 nfs' about the NFSv4 'port'
option:
> If the specified port value is 0, then the NFS client uses the NFS
> service port number advertised by the server's rpcbind service.
Reported in the community forum:
https://forum.proxmox.com/threads/118466/post-524449
https://forum.proxmox.com/threads/120774/
Fiona Ebner [Fri, 9 Dec 2022 10:30:41 +0000 (11:30 +0100)]
fix #4390: rbd: snapshot delete: avoid early return to fix handling TPM drive
The only caller where $running can even be truthy is QemuServer.pm's
qemu_volume_snapshot_delete(). But there, a check if snapshots should
be done with QEMU is already made and the storage function is only
called if snapshots should not be done with QEMU (like for TPM drives
which are not attached directly). So rely on the caller and do not
return early to fix removing snapshots in such cases.
Even if a stray call ends up here (can happen already by changing the
krbd setting while a VM is running to flip the above-mentioned check
and the early return check removed by this patch), it might not even
be problematic. At least a quick test worked fine:
1. take snapshot via a monitor command in QEMU
2. remove snapshot via the storage layer
3. create a new file in the VM
4. take a snapshot with the same name via monitor command in QEMU
5. roll back to the snapshot
6. check that the file in the VM is as expected
Using the storage layer to take the snapshots and QEMU to remove the
snapshot also worked doing the same test. Even if it were problematic,
the check in qemu-server should rather be improved then.
(Trying to issue a snapshot mon command for a krbd-mapped image fails
with an error on the other hand, but that is also not too bad and not
relevant to the storage code. Again, it rather would be necessary to
improve the check in qemu-server).
The fact that the pve-container code didn't even pass $running is the
reason removing snapshots worked for containers on a storage with krbd
disabled (the pve-container code calls map_volume() explicitly, so
containers can work regardless of the krbd setting in the storage
configuration; see commit 841fba6 ("call map_volume before using
volumes.") in pve-container).
For volume_snapshot(), the early return with $running was already
removed (or rather the relevant logic moved to QemuServer.pm) in 2015
by commit f5640e7 ("remove running from Storage and check it in
QemuServer"), even before krbd support was added in RBDPlugin.pm.
Fiona Ebner [Tue, 10 Jan 2023 12:52:43 +0000 (13:52 +0100)]
zfs: list zvol: limit recursion depth to 1
To be correct in all cases, it's still necessary to filter by "pool"
in zfs_parse_zvol_list(), because $scfg->{pool} could be e.g.
'foo/vm-123-disk-0' which looks like an image name and would pass the
other "should skip"-checks in zfs_parse_zvol_list().
No change in the result of zfs_list_zvol() is intended.
Leo Nunner [Thu, 5 Jan 2023 16:16:57 +0000 (17:16 +0100)]
plugin: change name, separator and error message for dir overrides
Rename the "dirs" parameter to "content-dirs". Switch from a "vtype:/dir"
format to "vtype=/dir", and remove the misleading error message talking
about "absolute" paths. One might expect these to be absolute over the
whole system, while in reality they are relative to the mountpoint of
the storage.
Leo Nunner [Mon, 2 Jan 2023 16:04:37 +0000 (17:04 +0100)]
config: add overrides for default directory locations
Allowing overrides for the default directory locations seems to
integrate rather well into the existing system. Custom locations
are specified using the "dirs" parameter as a comma-separated list
of "vtype:/location" values.
For now, the option has been enabled for the Directory, CIFS and NFS
backends.
Fiona Ebner [Tue, 20 Dec 2022 13:16:36 +0000 (14:16 +0100)]
zfs: list: only cache and list images for requested storage/pool
The plugin for remote ZFS storages currently also uses the same
list_images() as the plugin for local ZFS storages. There is only
one cache which does not remember the target host where the
information originated.
This is problematic for rescan in qemu-server:
1. Via list_images() and zfs_list_zvol(), ZFSPlugin.pm's zfs_request()
is executed for a remote ZFS.
2. $cache->{zfs} is set to the result of scanning the images there.
3. Another list_images() for a local ZFS storage happens and uses the
cache with the wrong information.
The only two operations using the cache currently are
1. Disk rescan in qemu-server which is affected by the issue. It is
done as part of (or as a) long-running operations.
2. prepare_local_job for pvesr, but the non-local ZFS plugin doesn't
support replication, so it cannot trigger there. The cache only
helps if there is a replicated guest with volumes on different
ZFS storages, but otherwise it will be less efficient than no
cache or querying every storage by itself.
Fix the issue by making the cache $storeid-specific, which effectively
makes the cache superfluous, because there is no caller using the same
$storeid multiple times. As argued above, this should not really hurt
the callers described above much and actually improves performance for
all other callers.
Fiona Ebner [Mon, 12 Dec 2022 12:33:09 +0000 (13:33 +0100)]
disk manage: pass full NVMe device path to smartctl
This essentially reverts commit c9bd3d2 ("fix #1123: modify NVME
device path for SMART support").
The man page for smartctl states
> Use the forms "/dev/nvme[0-9]" (broadcast namespace) or
> "/dev/nvme[0-9]n[1-9]" (specific namespace 1-9) for NVMe devices.
so it should be fine to pass the path with the specific namespace to
smartctl.
But that text was already present in the man page of version 6.5,
which is the version the commit c9bd3d2 talks about. It might be that
it was necessary to drop the specific namespace for the version
backported from Stretch to Jessie (the bug report mentions that that
version was used[0]), but it's not quite clear.
With current versions, passing in the path with the specific namespace
did work as expected[1], even on a device with multiple namespaces set
up tested locally. In PBS, the path queried via
udev::Device::from_syspath("/sys/block/{name}") is passed to smartctl
and that also included the specific namespace on the systems I tested
with a short script.
So pass the full path to make things a little bit simpler and to avoid
potential future issues like bug #2020[2].
Nowadays, relying on 'readlink /sys/block/nvmeXnY/device' won't always
lead to the correct device, as reported in the community forum[0],
where it results in '../../nvme-subsys0' and there's no matching entry
under '/dev/'.
Since Linux kernel 5.4, in particular commit 733e4b69d508 ("nvme:
Assign subsys instance from first ctrl"), the problematic situation
from bug #2020 shouldn't happen anymore.
Stated more clearly by the commit's author here[1]:
> Indeed, that commit will make the naming a bit more sane and will
> definitely prevent mistaken identity. It is still possible to
> observe controllers with instances that don't match their
> namespaces, but it is impossible to get a namespace instance that
> matches a non-owning controller.
The only other user of get_sysdir_info() doesn't use the 'device'
entry, so reverting that part is fine too.
Fiona Ebner [Wed, 23 Nov 2022 11:40:25 +0000 (12:40 +0100)]
get bandwidth limit: improve detecting if storages are involved
Previously, calling with e.g. $storage_list = [undef] would lead to an
early return of $override and not consider the limit from
datacenter.cfg.
Refactoring the bandwidth limit handling for migration introduced
calls such as described above, which broke applying the limit from
datacenter.cfg for VM RAM/state migration.
Reported in the community forum:
https://forum.proxmox.com/threads/37920/post-513005
Dominik Csapak [Thu, 10 Nov 2022 10:36:33 +0000 (11:36 +0100)]
api: FileRestore: make use of file-restores and guis timeout mechanism
file-restore has a 'timeout' parameter and if that is exceeded, returns
an error with the http code 503 Service Unavailable
When the web ui encounters such an error, it retries the listing a few
times before giving up.
To make use of these, add the 'timeout' parameter to the new
'extraParams' of 'file_restore_list'.
25 seconds are chosen because it's under pveproxy 30s limit, with a bit
of overhead to spare for the rest of the api call, like json decoding,
forking, access control checks, etc.
Dominik Csapak [Thu, 10 Nov 2022 10:36:32 +0000 (11:36 +0100)]
api: FileRestore: decode and return proper error of file-restore listing
since commit ba690c40 ("file-restore: remove 'json-error' parameter from list_files")
in proxmox-backup, the file-restore binary will return the error as json
when called with '--output-format json' (which we do in PVE::PBSClient)
here, we assume that 'file-restore' will fail in that case, and we try
to use the return value as an array ref which fails, and the user never
sees the real error message.
To fix that, check the ref type of the return value and act accordingly
Thomas Lamprecht [Fri, 11 Nov 2022 08:22:19 +0000 (09:22 +0100)]
disk manage: move "draid-config set only on draid level" assertion
so that there is a better code locality and also we avoid forgetting
to adapt the check for each specific draid-config parameter if a new
one gets added or an existing one changed.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
pbs: prune: avoid getting all snapshots for group assembly if fixed anyway
If both type and vmid is defined we don't need to list the current
snapshots, we simply can derive the single backup group from that and
let the PBS client handle the rest.
Should be a not so small speedup for most setups using PBS backup and
pruning configured on PVE side, as vzdump calls this separately for
every vmid on backup jobs with multiple guests included.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this format comes from the remote cluster, so it might not be supported
on the source side - checking whether it's known (as additional
safeguard) and untainting (to avoid open3 failure) is required.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ T: squashed in canonical perl array ref access ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 23 Sep 2022 09:54:41 +0000 (11:54 +0200)]
disk manage: module wide code/style cleanup
fixing some issues reported by perlcritic along the way.
cutting down 70 lines, often with even improving readability.
Tried to recheck and be conservative, so there shouldn't be any
regression, but it's still perl after all...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Aaron Lauterer [Fri, 19 Aug 2022 15:01:21 +0000 (17:01 +0200)]
disks: allow add_storage for already configured local storage
One of the smaller annoyances, especially for less experienced users, is
the fact, that when creating a local storage (ZFS, LVM (thin), dir) in a
cluster, one can only leave the "Add Storage" option enabled the first
time.
On any following node, this option needed to be disabled and the new
node manually added to the list of nodes for that storage.
This patch changes the behavior. If a storage of the same name already
exists, it will verify that necessary parameters match the already
existing one.
Then, if the 'nodes' parameter is set, it adds the current node and
updates the storage config.
In case there is no nodes list, nothing else needs to be done, and the
GUI will stop showing the question mark for the configured, but until
then, not existing local storage.
Aaron Lauterer [Fri, 19 Aug 2022 15:01:20 +0000 (17:01 +0200)]
disks: die if storage name is already in use
If a storage of that type and name already exists (LVM, zpool, ...) but
we do not have a Proxmox VE Storage config for it, it is possible that
the creation will fail midway due to checks done by the underlying
storage layer itself. This in turn can lead to disks that are already
partitioned. Users would need to clean this up themselves.
By adding checks early on, not only checking against the PVE storage
config, but against the actual storage type itself, we can die early
enough, before we touch any disk.
For ZFS, the logic to gather pool data is moved into its own function to
be called from the index API endpoint and the check in the create
endpoint.
RBD plugin: librados connect: increase timeout when in worker
The default timeout in PVE/RADOS.pm is 5 seconds, but this is not
always enough for external clusters under load. Workers can and should
take their time to not fail here too quickly.
The return value of get_rbd_dev_path() is only used when $scfg->{krbd}
evaluates to true and the function shouldn't have any side effects
that are needed later, so the call can be avoided otherwise.
This also saves a RADOS connection and command with configurations for
external clusters with krbd disabled.
fix #4189: pbs: bump list_volumes timeout to 2mins
When switching this from calling the external binary to
using the perl api client the timeout got reduced to 7
seconds, which is definitely insufficient for larger stores.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
pbs: detect mismatch of encryption settings and key
if the key file doesn't exist (anymore), but the storage.cfg references
one, die on commands that should use encryption instead of falling back
to plain-text operations.
Before af07f67 ("pbs: use vmid parameter in list_snapshots") the
namespace was set via do_raw_client_command, but now it needs to be
set explicitly here.
Fixes: af07f67 ("pbs: use vmid parameter in list_snapshots") Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Added a LOG_EXT constant as a counterpart to NOTES_EXT
and refactored usages for .log and .notes with them.
At some parts in the test case code I had to source new variables to
shorten the line length to not exceed the 100 column line limit.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Adapted unlink calls for archive files in case of ENOENT
This improves handling when two archive remove calls are creating a
race condition where one would formerly encounter an error. Now both
finish successfully.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fix #3972: Remove the .notes file when a backup is deleted
When a VM or Container backup was deleted, the .notes file was not
removed, therefore, over time the dump folder would get polluted with
notes for backups that no longer existed. As backup names contain a
timestamp and as the notes cannot be reused because of this, I think
it is safe to just delete them just like we do with the .log file.
Furthermore, I sourced the deletion of the log and notes file into a
new function called "archive_auxiliaries_remove". Additionally, the
archive_info object now returns one more field containing the name of
the notes file. The test cases have to be adapted to expect this new
value as the package will not compile otherwise.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com> Reviewed-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Aaron Lauterer [Mon, 23 May 2022 10:54:25 +0000 (12:54 +0200)]
rbd: get_rbd_dev_path: return /dev/rbd path only if cluster matches
The changes in cfe46e2d4a97a83f1bbe6ad656e6416399309ba2 git not catch
all situations.
In the case of a guest having 2 disk images with the same name on a pool
with the same name but in two different ceph clusters we still had
issues when starting it. The first disk got mapped as expected. The
second disk did not get mapped because we returned the old $path to
"/dev/rbd/<pool>/<image>" because it already existed from the first
disk.
In the case that only the "old" /dev/rbd path exists and we do not have
the /dev/rbd-pve/<cluster>/... path available, we now check if the
cluster fsid used by that rbd device matches the one we expect. If it
does, then we are in the situation that the image has been mapped before
the new rbd-pve udev rule was introduced. If it does not, then we have
the situation of an ambiguous mapping in /dev/rbd and return the
$pve_path.
Aaron Lauterer [Wed, 18 May 2022 09:04:54 +0000 (11:04 +0200)]
rbd: fix #4060 show data-pool usage when configured
When a data-pool is configured, use it for status infos. The 'data-pool'
config option is used to mark the erasure coded pool while the 'pool'
will be the replicated pool holding meta data such as the omap.
This means, the 'pool' will only use a small amount of space and people
are interested how much they can store in the erasure coded pool anyway.
Therefore this patch reorders the assignment of the used pool name by
availability of the scfg parameters: data-pool -> pool -> fallback 'rbd'
Stoiko Ivanov [Tue, 3 May 2022 11:31:40 +0000 (13:31 +0200)]
rbd: warn if no stats for a pool could be gathered
happens in case of a mistyped poolname, and the new message should be
more helpful than:
`Use of uninitialized value $free in addition (+) at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 64`
Stoiko Ivanov [Tue, 3 May 2022 11:31:39 +0000 (13:31 +0200)]
rbd: add fallback default poolname 'rbd' to status
the fallback to a default pool name of 'rbd' was introduced in: 1440604a4b072b88cc1e4f8bbae4511b50d1d68e
and worked for the status command, because it used the `rados_cmd`
sub.
leading to confusing errors:
`Use of uninitialized value in string eq at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 633`
(e.g. in the journal from pvestatd)
Thomas Lamprecht [Thu, 28 Apr 2022 16:17:56 +0000 (18:17 +0200)]
rbd: get path: allow fake override of fsid in scfg for some regression tests
to avoid calls into RADOS connect, that trigger RPCEnv not
initialized breakage in regression tests, but wouldn't really work
otherwise either
in the future the RBD $scfg could actually support this (or similarly
named) property, to safe on storage addition and then avoid frequent
mon commands
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>