Stefan Reiter [Mon, 5 Jul 2021 09:14:12 +0000 (11:14 +0200)]
api: always add new CD drives to bootorder
Attaching an ISO image to a VM is usually/often done for two reasons:
* booting an installer image
* supplying additional drivers to an installer (e.g. virtio)
Both of these cases (the latter at least with SeaBIOS and the Windows
installer) require the disk to be marked as bootable.
For this reason, enable the bootable flag for all new CDROM drives
attached to a VM by adding it to the bootorder list. It is appended to
the end, as otherwise it would cause new drives to boot before already
existing boot targets, which would be a more grave (and IMO bad)
behaviour change.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Thomas Lamprecht [Fri, 23 Jul 2021 08:55:16 +0000 (10:55 +0200)]
lvm: avoid the use of IO uring
there may be a kernel issue or a bug in how QEMU uses io_uring, but
we have users that report crashes which f.ebner could see on some
workloads, not really deterministic though and it seems that in newer
kernel versions (5.12+) the crash becomes a hang
While we're closing in on the actual issue here (which could be the
same as for RBD) let's disable io_uring for LVM.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stefan Reiter [Thu, 1 Jul 2021 09:37:29 +0000 (11:37 +0200)]
live-restore: preload efidisk before starting VM
The efidisk never got restored correctly before, since we don't use the
generic print_drive_commandline_full for it, and as such it didn't get a
backing image attached. This not only causes the efidisk data to be lost
on restore, but also an error at the end, since we try to remove a
non-existing PBS blockdev.
Since it is attached differently to a regular drive, adding PBS backing
would be more difficult, but not to worry: an efidisk is small enough
that it doesn't hurt performance to just restore it via the regular
mechanism before starting the VM, and simply excluding it from the live
restore entirely.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Wed, 30 Jun 2021 15:18:17 +0000 (17:18 +0200)]
cfg2cmd/drive: don't use io_uring for krbd with wb/wt cache
As reported here and locally reproduced:
https://forum.proxmox.com/threads/efi-vms-wont-start-under-7-beta-with-writeback-cache.91629/
This configuration is currently broken. Until we figure out how to fix
it properly, we can just have this (luckily very narrow) config pattern
fall back to aio=threads as it used to.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
otherwise it'll produce a whole lot of checksum errors
and while this would be nice as a storage feature check,
it's hard to be 100% accurate there anyway since a directory
storage can point anywhere, like for instance a btrfs
directory, causing the same issue...
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
this allows effectively setting ALL volumes as read-only, even if the
disk controller does not support it. without it, IDE and SATA disks
with (base) volumes which are marked read-only/immutable on the storage
level prevent the template VM from starting for backup purposes.
otherwise backups of templates using UEFI fail with storages like LVM
thin, where the volumes are not writable. disk controllers like IDE and
SATA that don't support being read-only are still broken for UEFI.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ drop the readonly=off when not required, resolve merger conflict
from Dominik's EFI disk cache mode fix ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
for unprivileged users (and possibly some root setups). reading from
pmxcfs now results in a hard error for unprivileged users, so there
might be some more of these lurking somewhere..
Stefan Reiter [Mon, 21 Jun 2021 16:35:41 +0000 (18:35 +0200)]
use KillMode 'process' for systemd scope
KillMode 'none' is deprecated, and systemd loudly complains about that
in the journal. To avoid the warning, but keep the behaviour the same,
use KillMode 'process'.
This mode does two things differently, which we have to stop it from
doing:
* it sends SIGTERM right when the scope is cancelled (e.g. on shutdown)
-> but only to the "root" process, which in our case is the worker
instance forking QEMU, so it is already dead by the time this happens
* it sends SIGKILL to *all* children after a timeout
-> can be avoided by setting either SendSIGKILL to false, or
TimeoutStopUSec to infinity - for safety, we do both
In my testing, this replicated the previous behaviour exactly, but
without using the deprecated 'none' mode.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Mon, 21 Jun 2021 15:33:18 +0000 (17:33 +0200)]
cfg2cmd: make io_uring default
The 'aio' setting is not visible to the guest, and so can be changed
during migrations or snapshots without issue. It is thus only
dependendent on the actual QEMU version being >= 6.0, not machine
version.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Fabian Ebner [Fri, 18 Jun 2021 10:59:34 +0000 (12:59 +0200)]
migrate: enforce that image content type is available
and use it for the vdisk_list call too. This avoids scanning (and picking up
volumes from!) storages that are not even configured to hold images.
Previously, the content type was only enforced when a storage map was present.
Also serves a bit as a preparation to enforce content type on guest startup,
because now migration failure happens early and not only when trying to start
the guest on the remote node.
Fabian Ebner [Fri, 18 Jun 2021 10:59:33 +0000 (12:59 +0200)]
prefer storage_check_enabled over storage_check_node
storage_check_enabled simply checks for the 'disable' option and then calls
storage_check_node.
While not strictly necessary for a second call where only the storage differs,
e.g. in case of clone, it is more future-proof: if support for a target storage
is added at some point, it might be easy to miss adapting the call.
For the migration checks, the situation is improved by now always catching
disabled (target) storages.
Fabian Ebner [Mon, 31 May 2021 14:27:10 +0000 (16:27 +0200)]
test: fix restore config test as unprivileged user
after upgrading to bullseye, the cfs_read_file call within
restore_update_config_line() results in an error:
Is a directory!
when done as an unprivileged user.
Fabian Ebner [Tue, 1 Jun 2021 06:43:06 +0000 (08:43 +0200)]
vm status: force int where appropriate
to avoid potential problems with stringified numbers in Javascript and
elsewehere.
The vmid was not always an integer as the return schema expects, namely
when there was an opt_vmid argument, because the 'ne' comparision coerced the
vmid to be a string then.
avoid setting lun number for drives when pvscsi controller is used
Reported in the community forum[0].
In QEMU's hw/scsi/vmw_pvscsi.c in the SCSIBusInfo struct, the max_lun property
is set to 0. This means that in our stack, one cannot have multiple disks and
use 'scsihw: pvscsi' currently, as kvm would fail with
bad scsi device lun: 1
Instead of increasing the lun number, increase the scsi-id, as we already do for
lsi.* (in hw/scsi/lsi53c895a.c the max_lun property is also 0).
Dominik Csapak [Wed, 16 Jun 2021 13:09:33 +0000 (15:09 +0200)]
fix #3329: turn on cache=writeback for efidisks on rbd
on slower ceph clusters, the write pattern of the ovmf booting process
slows down the boot of the vm, so we turn on caching by default
it seems no other storage (until now) behaves like this. if it does in
the future, we can still add them too, or add a 'cache' property for
the efidisk
Fabian Ebner [Fri, 4 Jun 2021 13:49:29 +0000 (15:49 +0200)]
scan volids: remove superfluous parameter
The only caller that didn't use 'images' was removed as part of the migration
refactoring in commit 62a4c963b824c923a4fc82a48c81d0f63ebaddae, so this is not
even a breaking change as the 'PVE 7' comment might've suggested.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Thu, 27 May 2021 10:27:51 +0000 (12:27 +0200)]
qm: assume correct VNC setup in 'vncproxy', disallow passwordless
The QMP 'change' command is no longer available since QEMU 6.0, so this
cannot work - instead of replacing it, we can just remove it however.
The 'if' branch would only set the VNC socket path anew and enable
password mode, which is always set and enabled on startup already.
The 'else' branch was intended for certificate login (?), which
according to the FIXME comment is long gone anyway - simply forbid
'vncproxy' without the PVE ticket environment variable set.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
for IDE and SATA, setting the whole drive into readonly mode is not
possible. skip the readonly flag for such drives as a workaround until
we find a better solution.
would return the size of scsi0 even though it would first boot
from cdrom/network.
When editing the bootorder with such a legacy config, we
remove the 'bootdisk' property and replace the legacy notation
with an explicit order, but we only search the first disk
for the size now.
Restore that behaviour by iterating over all disks in the boot
order property string until we get one that is not a cdrom
and has a size.
Thomas Lamprecht [Thu, 29 Apr 2021 12:27:41 +0000 (14:27 +0200)]
Revert "migration: do not set default speed limit"
The default was changed for 5.2, so while it is not 32 MiB/s anymore,
it is still 128 MiB/s which I did not notice on my 1 Gbps (or < 125
MiB/s) setup. For users with links faster than one gigabit it now did
some limiting - so setup a very high limit so than even 100G should
not max this out.
The variable is only ever used for calculating the average speed of memory
migration, but it was set before disk mirroring already. But the disk
sizes are not included in the calculation, resulting in (very) wrong values.
Thomas Lamprecht [Mon, 19 Apr 2021 20:01:05 +0000 (22:01 +0200)]
migration: keep log rate steady if polling gets more frequent
Either we're done in a few seconds anyway, or if the VM dirties lots
of pages we need quite a bit of time, and then it does not help to
output roughly the same status 10 times a second...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Mon, 19 Apr 2021 19:54:33 +0000 (21:54 +0200)]
migration: rework logging to more humand friendly, less spammy
* use render_bytes where possible, to get quick to read and grasp
units printed
* xbzrle is only interesting if actually pages/bytes are send using
it, so only log in that case
* log if VM dirties more than we send
* log current speed we get from QEMU
In general there are less lines logged and huge integers are avoided.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Fri, 29 Jan 2021 15:11:43 +0000 (16:11 +0100)]
migration: move finishing block jobs to phase2 for better/uniform error handling
avoids the possibility to die during phase3_cleanup and instead of needing to
duplicate the cleanup ourselves, benefit from phase2_cleanup doing so.
The duplicate cleanup was also very incomplete: it didn't stop the remote kvm
process (leading to 'VM already running' when trying to migrate again
afterwards), but it removed its disks, and it didn't unlock the config, didn't
close the tunnel and didn't cancel the block-dirty bitmaps.
Since migrate_cancel should do nothing after the (non-storage) migrate process
has completed, even that cleanup step is fine here.
Since phase3 is empty at the moment, the order of operations is still the same.
Also add a test, that would complain about finish_tunnel not being called before
this patch. That test also checks that local disks are not already removed
before finishing the block jobs.
Fabian Ebner [Fri, 29 Jan 2021 15:11:39 +0000 (16:11 +0100)]
migration: cleanup_remotedisks: simplify and include more disks
Namely, those migrated with storage_migrate by using the information from
volume_map. Call cleanup_remotedisks in phase1_cleanup as well, because that's
where we end if sync_offline_local_volumes fails, and some disks might already
have been transfered successfully. Note that the local disks are still here, so
this is fine.
Fabian Ebner [Fri, 29 Jan 2021 15:11:38 +0000 (16:11 +0100)]
migration: simplify removal of local volumes and get rid of self->{volumes}
This also changes the behavior to remove the local copies of offline migrated
volumes only after the migration has finished successfully (this is relevant
for mixed settings, e.g. online migration with unused/vmstate disks).
local_volumes contains both, the volumes previously in $self->{volumes}
and the volumes in $self->{online_local_volumes}, and hence is the place
to look for which volumes we need to remove. Of course, replicated
volumes still need to be skipped.
Fabian Ebner [Fri, 29 Jan 2021 15:11:35 +0000 (16:11 +0100)]
migration: fix calculation of bandwith limit for non-disk migration
The case with:
1. no generic 'migration' limit from the storage plugin
2. a migrate_speed limit in the VM config
was broken. It would assign 0 to migrate_speed when picking the minimum value
and then default to the default value. Fix it by checking if bwlimit is 0
before picking the minimum.
Also, make it a bit more readable by avoiding the trick of //-assigning bwlimit
before the units match up and relying on getting back the original bwlimit value
as the minimum. Instead, only ||-assign after the units match up and don't rely
on other things.
Fabian Ebner [Fri, 29 Jan 2021 15:11:32 +0000 (16:11 +0100)]
migration: split sync_disks into two functions
by making local_volumes class-accessible. One functions is for scanning all local
volumes and one is for actually syncing offline volumes via storage_migrate. The
exception is replicated volumes, this still happens during the scan for now.
Also introduce a filter_local_volumes helper, to makes life easier.