besides the log calls these don't need any parts of the migration state,
so let's make them generic and re-use them for container migration and
replication in the future.
when passing a config from one cluster to another, we want to be strict
when parsing - it's better to fail the migration early and upgrade the
target node instead of failing the migration later (when significant
work for transferring disks and/or state has already been done) or not
at all, but silently lose config settings that the target doesn't
understand.
this also might be helpful in other cases - e.g. when restoring from a
backup.
Fabian Ebner [Thu, 27 Jan 2022 14:01:53 +0000 (15:01 +0100)]
api: clone: fork before locking
using the familiar early+repeated checks pattern from other API calls.
Only intended functional changes are with regard to locking/forking.
For a full clone of a running VM without guest agent, this also fixes
issuing vm_{resume,suspend} calls for drive mirror completion.
Previously, those just timed out, because of not getting the lock:
> create full clone of drive scsi0 (rbdkvm:vm-104-disk-0)
> Formatting '/var/lib/vz/images/105/vm-105-disk-0.raw', fmt=raw
> size=4294967296 preallocation=off
> drive mirror is starting for drive-scsi0
> drive-scsi0: transferred 2.0 MiB of 4.0 GiB (0.05%) in 0s
> drive-scsi0: transferred 635.0 MiB of 4.0 GiB (15.50%) in 1s
> drive-scsi0: transferred 1.6 GiB of 4.0 GiB (40.50%) in 2s
> drive-scsi0: transferred 3.6 GiB of 4.0 GiB (90.23%) in 3s
> drive-scsi0: transferred 4.0 GiB of 4.0 GiB (100.00%) in 4s, ready
> all 'mirror' jobs are ready
> suspend vm
> trying to acquire lock...
> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout
> drive-scsi0: Cancelling block job
> drive-scsi0: Done.
> resume vm
> trying to acquire lock...
> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout
This allows mobile- and vGPUs to be presented to the guest as if they
were the original desktop variants of the card. It also allows
device-ID variants that guests don't know about to be renamed to
match compatible sibling devices the guest does have drivers for
(e.g. to remove manufacturer-specific vendor ID variants that prevent
the use of a device which would otherwise have a supported chipset)
e.g. hostpci0: 03:00,vendor-id=0x8086,device-id=0x10f6
Signed-off-by: Nicholas Sherlock <n.sherlock@gmail.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Mira Limbeck [Mon, 20 Dec 2021 14:03:59 +0000 (15:03 +0100)]
fix #3792: cloudinit: use of uninitialized value
With the patch adding vendor-data support to cloud-init, a use of
uninitialized value was introduced. This can be fixed by setting it to
an empty string if no vendor-data is defined.
vendor-data can only be set via --cicustom and is optional.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Oguz Bektas [Thu, 2 Dec 2021 11:43:03 +0000 (12:43 +0100)]
avoid writing the config if there are no pending changes to apply
We drop properties which we do not understand and we call
`vmconfig_apply_pending` on stop and before start, so if a user tried
to edit the config or downgraded qemu-server they may get stuff
dropped from the config just by doing a stop/start, which may be a
bit too confusing, also the write is just unnecessary then.
we also have the same skipping logic when starting vms, this way we
avoid calling 'write_config' when there are no present changes to
commit.
migrate: send updated TPM state volid to target node
The volid may change if local-storage migration is involved, we need
to tell the target node the new one and update the in-memory config
for starting the target VM accordingly.
this possibly breaks migration new -> old iff
- spice is not used (else the explicit ticket wins because it comes
later)
- a local TPM state volume is used
- that local TPM state volume has a different volume id on the target
node (switched storage, volname already taken, ..)
because the target node will then mis-interpret the tpmstate0 line as
spice ticket and set it accordingly. if the old tpm state volume ID does
not exist on the target node, migration will fail. if it exists by
chance, it might work albeit with a wrong spice ticket (new because of
this patch) and tpm state volume (pre-existing breakage).
This patch fixes the wrong attempt of setting up an NBD server for
the replicated TPM state volume, in contrast to the other volumes the
TPM state is managed by swtpm and isn't available to QEMU for
block-migration/bitmap tracking.
Note that we do migrate the state volume via a storage migration
anyway if necessary.
This code path was only triggered for replicated VMs with TPM.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Mon, 15 Nov 2021 13:07:35 +0000 (14:07 +0100)]
pci: do not reserve pci-ids for mediated devices
else a user cannot use more than one mdev per card per host.
We do not need to reserve them at all, since sysfs will error out
on creation/reuse anyway
Thomas Lamprecht [Mon, 15 Nov 2021 08:21:48 +0000 (09:21 +0100)]
api: update: fix missing newline in background-delayed task error
this error path is mostly used for re-attaching disks and the like,
and the "check if task is already done" part uses a method to read
the task status that will never include a trailing newline, so add it
our self to avoid "... at /usr/share/perl5/PVE/API2/Qemu.pm line
1480. (500)"
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Fri, 5 Nov 2021 13:06:11 +0000 (14:06 +0100)]
cfg2cmd: turn smm off when SeaBIOS and serial display are used
Since commit 277d33454f77ec1d1e0bc04e37621e4dd2424b67 in pve-qemu,
smm=off is no longer the default, but with SeaBIOS and serial display,
this can lead to a boot loop.
Reported in the community forum [0] and reproduced with a Debian 10
VM.
with `storage` being optional (and not allowed for reassign operations),
the ACL path in the schema can end up as `/storage/-`, which is wrong.
replace it with an explicit check:
- target `storage` for move disk
- storage from source disk for reassign disk (we only rename here, but
it's still a new volume on that storage after all)
Aaron Lauterer [Tue, 9 Nov 2021 14:55:35 +0000 (15:55 +0100)]
api: move-disk: add move to other VM
The goal of this is to expand the move-disk API endpoint to make it
possible to move a disk to another VM. Previously this was only possible
with manual intervertion either by renaming the VM disk or by manually
adding the disks volid to the config of the other VM.
Thomas Lamprecht [Thu, 21 Oct 2021 07:51:22 +0000 (09:51 +0200)]
cfg2cmd: switch off ACPI hotplug on bridges for q35 VMs
See commit 17858a1695 (hw/acpi/ich9: Set ACPI PCI hot-plug as default
on Q35)[0] in upstream QEMU repository for details about why the change
was made.
As that change affects systemds predictable interface naming[1],
e.g., by going from a previously `ens18` name to `enp6s18`, it may
have rather bad effects for users that did not setup some .link files
to enforce a specific naming by an more stable information like the
NIC's MAC-Address
The alternative would be making the preferred mode of hotplug an
option like `hotplug-mode=<acpi|pcie>`, but it does not seems like
one would like to change that much in the first place...
Note the changes to the tests and especially the tests with q35
machines that did not change.
Thomas Lamprecht [Thu, 21 Oct 2021 07:19:54 +0000 (09:19 +0200)]
config: meta: also save the QEMU version installed during creation
This is intended to be used to apply some workarounds for the
non-windows ostyped VMs which we'd still like to not pin on a
specific machine version, as normally Linux et al. can cope with such
changes on fresh boot just fine and until now this was a once every
few year issue (albeit systemd's "predictable" interface naming has
some potential to pick up on churn frequency).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Thu, 21 Oct 2021 07:10:49 +0000 (09:10 +0200)]
config: add new meta property with the VM creation time
currently we only add the creation time (ctime), that was requested
as low priority wish from some users from time to time.
Note that the meta info is not available in the update API endpoints,
and at the moment the code should not change/add/delete it either in
any place.
We may want to update in on actions like clone or backup-restore in
the future, e.g., to also save the time of that event and possibly
the original source VMID, put that can be thought out later.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Wed, 27 Oct 2021 11:35:27 +0000 (13:35 +0200)]
drives: expose 'readonly' flag of qemu for scsi/virtio
this allows a user to set a drive to 'read-only'. This can be useful
if a disk should not be written to, or if the backing file/source is
not writable (like a mapped pbs backup to /dev/loopX).
the option is named 'ro', to achieve consistency with containers
while this could also be achieved by setting 'snapshot=1', this would
create a temporary file in /var/tmp which can get quite big.
Stefan Reiter [Wed, 27 Oct 2021 11:34:54 +0000 (13:34 +0200)]
vzdump: increase timeout for QMP 'cont' after backup start
Since 'backup' can now work asynchronously, QEMU may not be ready to
receive the next QMP command ('cont') immediately. Thus, increase the
timeout, to avoid aborted backups in slow environments.
There may be a deeper QEMU bug hidden under the covers here too, but at
least one user reported success with simply increasing the timeout:
https://forum.proxmox.com/threads/pve7-pbs2-backup-timeout-qmp-command-cont-failed-got-timeout.95212/page-2#post-426261
See also:
https://bugzilla.proxmox.com/show_bug.cgi?id=3693
https://forum.proxmox.com/threads/problem-seit-update-auf-7-0.97388/
https://forum.proxmox.com/threads/error-with-backup-when-backing-up-qmp-command-query-backup-failed-got-wrong-command-id.88017/page-3#post-416339
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 14 Oct 2021 09:28:49 +0000 (11:28 +0200)]
swtpm: wait for pidfile
swtpm may take a little bit to daemonize, so the pidfile might not be
available right after run_command. Causes an ugly warning about using an
undefined value in a match, so wait up to 5s for it to appear.
Note that in testing this loop only ever got to the first or second
iteration, so I believe the timeout duration should be more than enough.
Also add a missing 'usleep' import, 'usleep' was used before but never
imported, apparently the other case never got triggered...
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 14 Oct 2021 09:28:48 +0000 (11:28 +0200)]
snapshot: fix tpmstate with rbd
QEMU doesn't know about the tpmstate, so 'do_snapshots_with_qemu' should
never return true in that case. Note that inconsistencies related to
snapshot timing do not matter much, as the actual TPM data is exported
together with other device state by QEMU anyway.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Dominik Csapak [Thu, 7 Oct 2021 13:45:31 +0000 (15:45 +0200)]
fix #3258: block vm start when pci device is already in use
on vm start, we reserve all pciids that we use, and
remove the reservation again in vm_stop_cleanup
first with only a time-based reservation but after the vm is started,
we reserve again but with the pid.
for this, we have to move the start_timeout calculation above the
hostpci handling.
also moved the pci initialization out of the conf parsing loop
so that we can reserve all ids before we actually touch any of them
while touching the lines, fix the indentation
this way, when a vm starts with a pci device that is already configured
for a different running vm, will not be started and the user gets
the error that the device is already in use
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 15 Oct 2021 16:08:22 +0000 (18:08 +0200)]
pci reservation: rework helpers style and readability wise
both style and readability are naturally subjective to a certain
degree...
Also, this patch mixes a bit much into one thing, but splitting that
up would mean lots of work I just wanted to avoid, sorry about that.
Among other things:
- avoid a level of indentation in the reserve loop
- rename pciids to reservation_list where it was a better fit
- make reserve set either pid or time to avoid suggesting that we
save both
- rename parameters to requested/dropped IDs for easier understanding
what's going on in the code
- avoid old_pid/pid, use running_pid and reserver_pid instead to
clarify what they actually mean
- drop useless returns to avoid suggesting the return value has any
use and save some lnes
- use a hash slice to delete all dropped IDs at once, shorter and
faster
- use 5 second timeout for reservation, this does nothing intensive
nor does it wait for anything, so the critical section should be
really short, 5s is really long enough for a wait..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 15 Oct 2021 15:02:21 +0000 (17:02 +0200)]
pci reservation: move lock/reservation file into /run/qemu-server
lck needs to die, the days of any 8.3 file naming schemes are long
gone (in the server space that is ;)
/var/run is /run so use the shorter, and while /var/lock is a OK
place for the locks we try to keep lock and lock-object together
nowadays. The qemu-server sub-directory avoids overly cluttering the
already crowded top-level /run dir
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Thu, 7 Oct 2021 13:45:30 +0000 (15:45 +0200)]
pci: add helpers to (un)reserve pciids for a vm
saves a list of pciid <-> vmid mappings in /var/run
that we can check when we start a vm
if we're not given a pid but a timeout, we save the time when the
reservation will run out (current time + timeout + 5s) since each
vm start (until we can save the pid) varies from config to config
reserve_pci_usage and remove_pci_reservation always expect a list of ids
so that we can update the reservation for a vm all at once
Stefan Reiter [Tue, 5 Oct 2021 16:02:06 +0000 (18:02 +0200)]
ovmf: support secure boot with 4m and 4m-ms efidisk types
Provide support for secure boot by using the new "4m" and "4m-ms"
variants of the OVMF code/vars templates. This is specified on the
efidisk via the 'efitype' and 'ms-keys' parameters.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>