qmeventd: rework 'forced_cleanup' handling and set timeout to 60s
currently, the 'forced_cleanup' (sending SIGKILL to the qemu process),
is intended to be triggered 5 seconds after sending the initial shutdown
signal (SIGTERM) which is sadly not enough for some setups.
Accidentally, it could be triggered earlier than 5 seconds, if a
SIGALRM triggers in the timespan directly before setting it again.
Also, this approach means that depending on when machines are shutdown
their forced cleanup may happen after 5 seconds, or any time after, if
new vms are shut off in the meantime.
Improve this situation by reworking the way we deal with this cleanup.
We save the pidfd, time incl. timeout in the Client, and set a timeout
to 'epoll_wait' of 10 seconds, which will then trigger a forced_cleanup.
Remove entries from the forced_cleanup list when that entry is killed,
or when the normal cleanup took place.
To improve the shutdown behaviour, increase the default timeout to 60
seconds, which should be enough, but add a commandline toggle where
users can set it to a different value.
if the preparing of pci devices or the start of the vm fails, we need
to cleanup the pci devices (reservations *and* mdevs), or else
it might happen that there are leftovers which must be manually removed.
to include also mdevs now, refactor the cleanup code from 'vm_stop_cleanup'
into it's own function, and call that instead of only 'remove_pci_reservation'
also print the errors of the cleanup steps with 'warn', otherwise we
might discard important errors
Thomas Lamprecht [Fri, 16 Sep 2022 10:56:29 +0000 (12:56 +0200)]
qmp client: increase default fallback timeout to 5s
allowing slower or overloaded systems a higher chance to finish
commands while not being to long to be problematic for sync api calls
with their 30s total budget
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the volume path could contain escaped ":" or ",", which means their '\'
needs to be escaped another time for passing to HMP.
the same approach is used for hotplugging regular drives in
PVE::QemuServer, and is needed (at least) for RBD storages with IPv6
monhosts or an explicit monhost port.
This will break possibly existing workflows like
1. add second cloud-init
2. remove first cloud-init
to change the cloud-init storage.
On the other hand, it avoids unintended misconfiguration of having
mutliple cloud-init drives with potentially different settings.
Also in preparation for adding cloud-init-related API calls, where
not being able to assume that there's only one cloud-init drive/state
would complicate things quite a bit.
Thomas Lamprecht [Tue, 30 Aug 2022 07:04:41 +0000 (09:04 +0200)]
cpu config: map depreacated IceLake-Client CPU type to IceLake-Server
the former CPU type never existed on the market and will be dropped
by QEMU 7.1, so map it to the server variant as they're pretty much
identical anyway FIWCT.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Tue, 21 Jun 2022 14:07:04 +0000 (16:07 +0200)]
fix #3577: prevent suspension for VMs with pci passthrough
Prevent the user from suspending the vm at all, as while suspension
itself may finish, the saved state is incomplete as we can neither
save nor restore PCIe device state in any generic fashion, so
resuming will almost certainly break.
The single case when it could work is when the guest OS didn't uses
the passed through device at all, so there's no state, but that's
really odd (as why bother passing through then), and the user should
rather remove the hostpci entry in that case.
pbs: detect mismatch of encryption settings and key
if the key file doesn't exist (anymore), but the storage.cfg references
one, die when starting a backup that should use encryption instead of
falling back to plain-text operations.
Dominik Csapak [Fri, 12 Aug 2022 09:29:49 +0000 (11:29 +0200)]
automatically add 'uuid' parameter when passing through NVIDIA vGPU
When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.
Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.
Before, the two strings were one single string each, rather than multiple
separated by newlines.
In the docs, this looked very strange as there were linebreaks and the
dots were shown. Can be seen e.g. in api-viewer /nodes/{node}/qemu/{vmid}/config.
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
if the configured display hardware has the (optional) default type, but
some other attribute is set, this would match against `undef` and spew
lots of warnings in the logs.
and exit early if they are not met.
The necessary libraries were taken from Thomas' post in our community
forum:
https://forum.proxmox.com/threads/.61801/post-466767 (ff)
The /dev/dri/renderD.* check is based on util/drm.c in the current
qemu source code.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
fix #3861: migrate: fix live migration when cloud-init changes storage
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
In preparation to allow passing along certain parameters together with
'archive'. Moving the parameter checks to after the
conflicts-with-'archive' to ensure that the more telling error will
trigger first.
All check helpers should handle empty params fine, but check first
just to make sure and to avoid all the superfluous function calls.
Stefan Sterz [Thu, 24 Feb 2022 14:21:51 +0000 (15:21 +0100)]
parse vm config: remove "\s*" from multi-line comment regex
To be consistent with PBS's implementation of multi-line comments
remove "\s*" here too. Since the regex isn't lazy .* matches
everything \s* would anyway. (Note that new lines occurs after "$").
migrate: resume initially running VM when failing after convergence
When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.
(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).
Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.
Fabian Ebner [Thu, 17 Mar 2022 11:31:06 +0000 (12:31 +0100)]
api: update vm: print drive string for newly allocated/imported drives
In the spirit of c75bf16 ("qm importdisk: tell user to what VM disk we
actually imported"), and so that the information is not lost once qm
importdisk switches to re-using the API call.
Added for cloudinit too, because a new disk is allocated.
Fabian Ebner [Thu, 17 Mar 2022 11:31:04 +0000 (12:31 +0100)]
schema: drive: use separate schema when disk allocation is possible
via the special syntax <storeid>:<size>.
Not worth it by itself, but this is anticipating a new 'import-from'
parameter which is only used upon import/allocation, but shouldn't be
part of the schema for the config or other API enpoints.
Fabian Ebner [Thu, 17 Mar 2022 11:31:03 +0000 (12:31 +0100)]
api: add endpoint for parsing .ovf files
Co-developed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
[split into its own patch + minor improvements/style fixes] Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
[renamed API handler, since it's not an index] Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fabian Ebner [Thu, 17 Mar 2022 11:31:01 +0000 (12:31 +0100)]
api: clone vm: check against cloning running TPM state early
Drive keys are sorted when cloning and 'tpmstate0' comes late, so it
was likely that potentially large disks were already copied just to be
removed again, because of the TPM state restriction at the end.
Fabian Ebner [Thu, 17 Mar 2022 11:30:59 +0000 (12:30 +0100)]
clone disk: assert that drive name is the same for drive-mirror on single VM
because when the VM ID of target and source are the same,
qemu_drive_mirror_monitor() switches the QEMU device node over to the
new backing image. The planned import-from functionality makes it
possible to run into this, although for an a bit unusual use case.
Fabian Ebner [Wed, 23 Feb 2022 12:03:59 +0000 (13:03 +0100)]
fix #3424: api: snapshot delete: wait for active replication
A to-be-deleted snapshot might be actively used by replication,
resulting in a not (or only partially) removed snapshot and locked
(snapshot-delete) VM. Simply wait a few seconds for any ongoing
replication.
Fabian Ebner [Wed, 9 Mar 2022 10:09:08 +0000 (11:09 +0100)]
clone disk: pass in efi vars size rather than config
It's confusing that the config associated to the destination is
actually a reference to the source config for both existing callers.
Also, disk import will need to base the calculation on the passed-in
drive parameters and not just the current config, so this change is in
preparation for that too.
Fabian Ebner [Wed, 9 Mar 2022 10:09:05 +0000 (11:09 +0100)]
api: update: pass correct config when creating disks
While the new options should be written to the pending config, the
decisions (currently only one) in create_disks needs to be made for
the current config.
Seems to fix EFI disk creation, but actually, it's only
future-proofing, because, currently, the same OVMF_VARS file is
used independently of $smm.
The correct config is also needed to determine the correct size for
the EFI disk for the upcoming import-from feature.
Fabian Ebner [Wed, 9 Mar 2022 10:09:04 +0000 (11:09 +0100)]
api: create disks: always activate/update size when attaching existing volume
For creation, activation and size update never triggered, because the
passed in $conf is essentially the same as the creation $settings, so
the disk was always detected to be the same as the "existing" one. But
actually, all disks are new, so it makes sense to do it.
For update, activation and size update nearly always triggered,
because only the pending changes are passed in as $conf. The case
where it didn't trigger is when the same pending change was made twice
(there are cases where hotplug isn't done, but makes it even more
unlikely).
Fabian Ebner [Wed, 9 Mar 2022 10:09:03 +0000 (11:09 +0100)]
device unplug: verify that unplugging scsi disk completed
Avoids the error
adding drive failed: Duplicate ID 'drive-scsi1' for drive
that could happen when switching over to a new disk (e.g. via qm set),
if unplugging wasn't fast enough.
Oguz Bektas [Mon, 7 Mar 2022 13:20:25 +0000 (14:20 +0100)]
api: vm_start: 'force-cpu' is for internal migration use only
'force-cpu' parameter was introduced to allow live-migration of VMs with
custom CPU models; it does not need to be allowed for general use on
vm_start for regular users, since they would be able to set arbitrary
cpu types or cpuid parameters that aren't supported.
Fabian Ebner [Wed, 2 Mar 2022 13:21:07 +0000 (14:21 +0100)]
qmp client: increase timeout for thaw
Using a loop of freeze, sleep 5, thaw, sleep 5, an idling Windows 11
VM with 4 cores and 8GiB RAM once took 54 seconds for thawing. It took
less than a second about 90% of the time and maximum of a few seconds
for the majortiy of other cases, but there can be outliers where 10
seconds is not enough.
And there can be hookscripts executed upon thaw, which might also not
complete instantly.
fix #3886: QEMU restore: verify storage allows images before writing
When restoring a backup and the storage the disks would be created on
doesn't allow 'images', the process errors without cleanup.
This is the same behaviour we currently have when the storage is
disabled.