Fiona Ebner [Fri, 7 Oct 2022 12:41:50 +0000 (14:41 +0200)]
api: create/update vm: clamp cpuunit value
While the clamping already happens before setting the actual systemd
CPU{Shares, Weight}, it can be done here too, to avoid writing new
out-of-range values into the config.
Can't use a validator enforcing this because existing out-of-range
values should not become errors upon parsing the config.
set aliases for the previous ones for backward compat.
There's still cleanup potential, e.g., for snapshots, but to do that
nicely we may need (or want) to extend CLIHandler to accept commands
without fixed params also on the command group itself.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fiona Ebner [Thu, 27 Oct 2022 07:13:45 +0000 (09:13 +0200)]
fix #4099: disable io_uring for virtual disks on CIFS storages
Since kernel 5.15, there is an issue with io_uring when used in
combination with CIFS [0]. Unfortunately, the kernel developers did
not suggest any way to resolve the issue and didn't comment on my
proposed one. So for now, just disable io_uring when the storage is
CIFS, like is done for other storage types that had problematic
interactions.
It is rather easy to reproduce when writing large amounts of data
within the VM. I used
dd if=/dev/urandom of=file bs=1M count=1000
to reproduce it consistently, but your mileage may vary.
Some forum reports about users running into the issue [1][2][3].
qmeventd: rework 'forced_cleanup' handling and set timeout to 60s
currently, the 'forced_cleanup' (sending SIGKILL to the qemu process),
is intended to be triggered 5 seconds after sending the initial shutdown
signal (SIGTERM) which is sadly not enough for some setups.
Accidentally, it could be triggered earlier than 5 seconds, if a
SIGALRM triggers in the timespan directly before setting it again.
Also, this approach means that depending on when machines are shutdown
their forced cleanup may happen after 5 seconds, or any time after, if
new vms are shut off in the meantime.
Improve this situation by reworking the way we deal with this cleanup.
We save the pidfd, time incl. timeout in the Client, and set a timeout
to 'epoll_wait' of 10 seconds, which will then trigger a forced_cleanup.
Remove entries from the forced_cleanup list when that entry is killed,
or when the normal cleanup took place.
To improve the shutdown behaviour, increase the default timeout to 60
seconds, which should be enough, but add a commandline toggle where
users can set it to a different value.
if the preparing of pci devices or the start of the vm fails, we need
to cleanup the pci devices (reservations *and* mdevs), or else
it might happen that there are leftovers which must be manually removed.
to include also mdevs now, refactor the cleanup code from 'vm_stop_cleanup'
into it's own function, and call that instead of only 'remove_pci_reservation'
also print the errors of the cleanup steps with 'warn', otherwise we
might discard important errors
Thomas Lamprecht [Fri, 16 Sep 2022 10:56:29 +0000 (12:56 +0200)]
qmp client: increase default fallback timeout to 5s
allowing slower or overloaded systems a higher chance to finish
commands while not being to long to be problematic for sync api calls
with their 30s total budget
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the volume path could contain escaped ":" or ",", which means their '\'
needs to be escaped another time for passing to HMP.
the same approach is used for hotplugging regular drives in
PVE::QemuServer, and is needed (at least) for RBD storages with IPv6
monhosts or an explicit monhost port.
This will break possibly existing workflows like
1. add second cloud-init
2. remove first cloud-init
to change the cloud-init storage.
On the other hand, it avoids unintended misconfiguration of having
mutliple cloud-init drives with potentially different settings.
Also in preparation for adding cloud-init-related API calls, where
not being able to assume that there's only one cloud-init drive/state
would complicate things quite a bit.
Thomas Lamprecht [Tue, 30 Aug 2022 07:04:41 +0000 (09:04 +0200)]
cpu config: map depreacated IceLake-Client CPU type to IceLake-Server
the former CPU type never existed on the market and will be dropped
by QEMU 7.1, so map it to the server variant as they're pretty much
identical anyway FIWCT.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Tue, 21 Jun 2022 14:07:04 +0000 (16:07 +0200)]
fix #3577: prevent suspension for VMs with pci passthrough
Prevent the user from suspending the vm at all, as while suspension
itself may finish, the saved state is incomplete as we can neither
save nor restore PCIe device state in any generic fashion, so
resuming will almost certainly break.
The single case when it could work is when the guest OS didn't uses
the passed through device at all, so there's no state, but that's
really odd (as why bother passing through then), and the user should
rather remove the hostpci entry in that case.
pbs: detect mismatch of encryption settings and key
if the key file doesn't exist (anymore), but the storage.cfg references
one, die when starting a backup that should use encryption instead of
falling back to plain-text operations.
Dominik Csapak [Fri, 12 Aug 2022 09:29:49 +0000 (11:29 +0200)]
automatically add 'uuid' parameter when passing through NVIDIA vGPU
When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.
Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.
Before, the two strings were one single string each, rather than multiple
separated by newlines.
In the docs, this looked very strange as there were linebreaks and the
dots were shown. Can be seen e.g. in api-viewer /nodes/{node}/qemu/{vmid}/config.
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
if the configured display hardware has the (optional) default type, but
some other attribute is set, this would match against `undef` and spew
lots of warnings in the logs.
and exit early if they are not met.
The necessary libraries were taken from Thomas' post in our community
forum:
https://forum.proxmox.com/threads/.61801/post-466767 (ff)
The /dev/dri/renderD.* check is based on util/drm.c in the current
qemu source code.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
fix #3861: migrate: fix live migration when cloud-init changes storage
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
In preparation to allow passing along certain parameters together with
'archive'. Moving the parameter checks to after the
conflicts-with-'archive' to ensure that the more telling error will
trigger first.
All check helpers should handle empty params fine, but check first
just to make sure and to avoid all the superfluous function calls.
Stefan Sterz [Thu, 24 Feb 2022 14:21:51 +0000 (15:21 +0100)]
parse vm config: remove "\s*" from multi-line comment regex
To be consistent with PBS's implementation of multi-line comments
remove "\s*" here too. Since the regex isn't lazy .* matches
everything \s* would anyway. (Note that new lines occurs after "$").
migrate: resume initially running VM when failing after convergence
When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.
(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).
Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.
Fabian Ebner [Thu, 17 Mar 2022 11:31:06 +0000 (12:31 +0100)]
api: update vm: print drive string for newly allocated/imported drives
In the spirit of c75bf16 ("qm importdisk: tell user to what VM disk we
actually imported"), and so that the information is not lost once qm
importdisk switches to re-using the API call.
Added for cloudinit too, because a new disk is allocated.
Fabian Ebner [Thu, 17 Mar 2022 11:31:04 +0000 (12:31 +0100)]
schema: drive: use separate schema when disk allocation is possible
via the special syntax <storeid>:<size>.
Not worth it by itself, but this is anticipating a new 'import-from'
parameter which is only used upon import/allocation, but shouldn't be
part of the schema for the config or other API enpoints.
Fabian Ebner [Thu, 17 Mar 2022 11:31:03 +0000 (12:31 +0100)]
api: add endpoint for parsing .ovf files
Co-developed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
[split into its own patch + minor improvements/style fixes] Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
[renamed API handler, since it's not an index] Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fabian Ebner [Thu, 17 Mar 2022 11:31:01 +0000 (12:31 +0100)]
api: clone vm: check against cloning running TPM state early
Drive keys are sorted when cloning and 'tpmstate0' comes late, so it
was likely that potentially large disks were already copied just to be
removed again, because of the TPM state restriction at the end.
Fabian Ebner [Thu, 17 Mar 2022 11:30:59 +0000 (12:30 +0100)]
clone disk: assert that drive name is the same for drive-mirror on single VM
because when the VM ID of target and source are the same,
qemu_drive_mirror_monitor() switches the QEMU device node over to the
new backing image. The planned import-from functionality makes it
possible to run into this, although for an a bit unusual use case.
Fabian Ebner [Wed, 23 Feb 2022 12:03:59 +0000 (13:03 +0100)]
fix #3424: api: snapshot delete: wait for active replication
A to-be-deleted snapshot might be actively used by replication,
resulting in a not (or only partially) removed snapshot and locked
(snapshot-delete) VM. Simply wait a few seconds for any ongoing
replication.