and exit early if they are not met.
The necessary libraries were taken from Thomas' post in our community
forum:
https://forum.proxmox.com/threads/.61801/post-466767 (ff)
The /dev/dri/renderD.* check is based on util/drm.c in the current
qemu source code.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
fix #3861: migrate: fix live migration when cloud-init changes storage
Generalizes fd95d780 ("migrate: send updated TPM state volid to target
node") to also handle other offline migrated disks appearing in the
VM config, which currently should only be cloud-init.
Breaks migration new -> old under similar (edge-case-)conditions as fd95d780 did.
Keep sending the 'tpmstate0' STDIN parameter to avoid breaking new ->
old in the scenario fd95d780 fixed.
Keep parsing the vm_start 'tpmstate0' STDIN parameter to avoid
breaking old -> new, and to be able to keep sending it.
In preparation to allow passing along certain parameters together with
'archive'. Moving the parameter checks to after the
conflicts-with-'archive' to ensure that the more telling error will
trigger first.
All check helpers should handle empty params fine, but check first
just to make sure and to avoid all the superfluous function calls.
Stefan Sterz [Thu, 24 Feb 2022 14:21:51 +0000 (15:21 +0100)]
parse vm config: remove "\s*" from multi-line comment regex
To be consistent with PBS's implementation of multi-line comments
remove "\s*" here too. Since the regex isn't lazy .* matches
everything \s* would anyway. (Note that new lines occurs after "$").
migrate: resume initially running VM when failing after convergence
When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.
(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).
Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.
Fabian Ebner [Thu, 17 Mar 2022 11:31:06 +0000 (12:31 +0100)]
api: update vm: print drive string for newly allocated/imported drives
In the spirit of c75bf16 ("qm importdisk: tell user to what VM disk we
actually imported"), and so that the information is not lost once qm
importdisk switches to re-using the API call.
Added for cloudinit too, because a new disk is allocated.
Fabian Ebner [Thu, 17 Mar 2022 11:31:04 +0000 (12:31 +0100)]
schema: drive: use separate schema when disk allocation is possible
via the special syntax <storeid>:<size>.
Not worth it by itself, but this is anticipating a new 'import-from'
parameter which is only used upon import/allocation, but shouldn't be
part of the schema for the config or other API enpoints.
Fabian Ebner [Thu, 17 Mar 2022 11:31:03 +0000 (12:31 +0100)]
api: add endpoint for parsing .ovf files
Co-developed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
[split into its own patch + minor improvements/style fixes] Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
[renamed API handler, since it's not an index] Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fabian Ebner [Thu, 17 Mar 2022 11:31:01 +0000 (12:31 +0100)]
api: clone vm: check against cloning running TPM state early
Drive keys are sorted when cloning and 'tpmstate0' comes late, so it
was likely that potentially large disks were already copied just to be
removed again, because of the TPM state restriction at the end.
Fabian Ebner [Thu, 17 Mar 2022 11:30:59 +0000 (12:30 +0100)]
clone disk: assert that drive name is the same for drive-mirror on single VM
because when the VM ID of target and source are the same,
qemu_drive_mirror_monitor() switches the QEMU device node over to the
new backing image. The planned import-from functionality makes it
possible to run into this, although for an a bit unusual use case.
Fabian Ebner [Wed, 23 Feb 2022 12:03:59 +0000 (13:03 +0100)]
fix #3424: api: snapshot delete: wait for active replication
A to-be-deleted snapshot might be actively used by replication,
resulting in a not (or only partially) removed snapshot and locked
(snapshot-delete) VM. Simply wait a few seconds for any ongoing
replication.
Fabian Ebner [Wed, 9 Mar 2022 10:09:08 +0000 (11:09 +0100)]
clone disk: pass in efi vars size rather than config
It's confusing that the config associated to the destination is
actually a reference to the source config for both existing callers.
Also, disk import will need to base the calculation on the passed-in
drive parameters and not just the current config, so this change is in
preparation for that too.
Fabian Ebner [Wed, 9 Mar 2022 10:09:05 +0000 (11:09 +0100)]
api: update: pass correct config when creating disks
While the new options should be written to the pending config, the
decisions (currently only one) in create_disks needs to be made for
the current config.
Seems to fix EFI disk creation, but actually, it's only
future-proofing, because, currently, the same OVMF_VARS file is
used independently of $smm.
The correct config is also needed to determine the correct size for
the EFI disk for the upcoming import-from feature.
Fabian Ebner [Wed, 9 Mar 2022 10:09:04 +0000 (11:09 +0100)]
api: create disks: always activate/update size when attaching existing volume
For creation, activation and size update never triggered, because the
passed in $conf is essentially the same as the creation $settings, so
the disk was always detected to be the same as the "existing" one. But
actually, all disks are new, so it makes sense to do it.
For update, activation and size update nearly always triggered,
because only the pending changes are passed in as $conf. The case
where it didn't trigger is when the same pending change was made twice
(there are cases where hotplug isn't done, but makes it even more
unlikely).
Fabian Ebner [Wed, 9 Mar 2022 10:09:03 +0000 (11:09 +0100)]
device unplug: verify that unplugging scsi disk completed
Avoids the error
adding drive failed: Duplicate ID 'drive-scsi1' for drive
that could happen when switching over to a new disk (e.g. via qm set),
if unplugging wasn't fast enough.
Oguz Bektas [Mon, 7 Mar 2022 13:20:25 +0000 (14:20 +0100)]
api: vm_start: 'force-cpu' is for internal migration use only
'force-cpu' parameter was introduced to allow live-migration of VMs with
custom CPU models; it does not need to be allowed for general use on
vm_start for regular users, since they would be able to set arbitrary
cpu types or cpuid parameters that aren't supported.
Fabian Ebner [Wed, 2 Mar 2022 13:21:07 +0000 (14:21 +0100)]
qmp client: increase timeout for thaw
Using a loop of freeze, sleep 5, thaw, sleep 5, an idling Windows 11
VM with 4 cores and 8GiB RAM once took 54 seconds for thawing. It took
less than a second about 90% of the time and maximum of a few seconds
for the majortiy of other cases, but there can be outliers where 10
seconds is not enough.
And there can be hookscripts executed upon thaw, which might also not
complete instantly.
fix #3886: QEMU restore: verify storage allows images before writing
When restoring a backup and the storage the disks would be created on
doesn't allow 'images', the process errors without cleanup.
This is the same behaviour we currently have when the storage is
disabled.
Thomas Lamprecht [Thu, 10 Feb 2022 15:27:55 +0000 (16:27 +0100)]
api: qga file-write: drop the check for base64
it's potentially expensive to check and the user already needs to
explicitly turn auto-encoding off, besides QEMU/QGA should handle
that and just error out gracefully on bogus base64 values.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Thu, 10 Feb 2022 14:52:10 +0000 (15:52 +0100)]
fix #3683: agent file-write: enable user to encode the content themselves
by adding an optional parameter 'encode' (enabled by default). When it
is disabled, the content must be base64 encoded already. This
way, users can send a binary file to the vm by base64 encoding it
themselves
besides the log calls these don't need any parts of the migration state,
so let's make them generic and re-use them for container migration and
replication in the future.
when passing a config from one cluster to another, we want to be strict
when parsing - it's better to fail the migration early and upgrade the
target node instead of failing the migration later (when significant
work for transferring disks and/or state has already been done) or not
at all, but silently lose config settings that the target doesn't
understand.
this also might be helpful in other cases - e.g. when restoring from a
backup.
Fabian Ebner [Thu, 27 Jan 2022 14:01:53 +0000 (15:01 +0100)]
api: clone: fork before locking
using the familiar early+repeated checks pattern from other API calls.
Only intended functional changes are with regard to locking/forking.
For a full clone of a running VM without guest agent, this also fixes
issuing vm_{resume,suspend} calls for drive mirror completion.
Previously, those just timed out, because of not getting the lock:
> create full clone of drive scsi0 (rbdkvm:vm-104-disk-0)
> Formatting '/var/lib/vz/images/105/vm-105-disk-0.raw', fmt=raw
> size=4294967296 preallocation=off
> drive mirror is starting for drive-scsi0
> drive-scsi0: transferred 2.0 MiB of 4.0 GiB (0.05%) in 0s
> drive-scsi0: transferred 635.0 MiB of 4.0 GiB (15.50%) in 1s
> drive-scsi0: transferred 1.6 GiB of 4.0 GiB (40.50%) in 2s
> drive-scsi0: transferred 3.6 GiB of 4.0 GiB (90.23%) in 3s
> drive-scsi0: transferred 4.0 GiB of 4.0 GiB (100.00%) in 4s, ready
> all 'mirror' jobs are ready
> suspend vm
> trying to acquire lock...
> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout
> drive-scsi0: Cancelling block job
> drive-scsi0: Done.
> resume vm
> trying to acquire lock...
> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout
This allows mobile- and vGPUs to be presented to the guest as if they
were the original desktop variants of the card. It also allows
device-ID variants that guests don't know about to be renamed to
match compatible sibling devices the guest does have drivers for
(e.g. to remove manufacturer-specific vendor ID variants that prevent
the use of a device which would otherwise have a supported chipset)
e.g. hostpci0: 03:00,vendor-id=0x8086,device-id=0x10f6
Signed-off-by: Nicholas Sherlock <n.sherlock@gmail.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Mira Limbeck [Mon, 20 Dec 2021 14:03:59 +0000 (15:03 +0100)]
fix #3792: cloudinit: use of uninitialized value
With the patch adding vendor-data support to cloud-init, a use of
uninitialized value was introduced. This can be fixed by setting it to
an empty string if no vendor-data is defined.
vendor-data can only be set via --cicustom and is optional.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Oguz Bektas [Thu, 2 Dec 2021 11:43:03 +0000 (12:43 +0100)]
avoid writing the config if there are no pending changes to apply
We drop properties which we do not understand and we call
`vmconfig_apply_pending` on stop and before start, so if a user tried
to edit the config or downgraded qemu-server they may get stuff
dropped from the config just by doing a stop/start, which may be a
bit too confusing, also the write is just unnecessary then.
we also have the same skipping logic when starting vms, this way we
avoid calling 'write_config' when there are no present changes to
commit.
migrate: send updated TPM state volid to target node
The volid may change if local-storage migration is involved, we need
to tell the target node the new one and update the in-memory config
for starting the target VM accordingly.
this possibly breaks migration new -> old iff
- spice is not used (else the explicit ticket wins because it comes
later)
- a local TPM state volume is used
- that local TPM state volume has a different volume id on the target
node (switched storage, volname already taken, ..)
because the target node will then mis-interpret the tpmstate0 line as
spice ticket and set it accordingly. if the old tpm state volume ID does
not exist on the target node, migration will fail. if it exists by
chance, it might work albeit with a wrong spice ticket (new because of
this patch) and tpm state volume (pre-existing breakage).