Thomas Lamprecht [Tue, 19 Nov 2019 08:25:54 +0000 (09:25 +0100)]
clone: pre-create cloud-init disk for destination
While we may not want to copy the cloudinit disk/drive, we still need
to create+allocate the volume, else the next start complains about a
missing CI drive..
Matt Dunwoodie [Mon, 18 Nov 2019 06:46:12 +0000 (17:46 +1100)]
Add 'type' to agent_fmt
This adds an extra field to agent_fmt that specifes the type of guest
agent connection to use. Currently there is no choice, and defaults to
virtio-serial. Since qemu-ga also runs over isa-serial, this allows OSes
such as NetBSD and OpenBSD, which do not have support for virtio-serial,
to run a qemu-ga.
This is an optional field, which leaves the default as virtio-serial. As
it doesn't change the default, it will require no change to older
configuration files.
Aaron Lauterer [Mon, 18 Nov 2019 14:23:18 +0000 (15:23 +0100)]
api/migration: fix autocomplete for targetstorage
Show storages configured for the target node and not for the current one
because they can be different.
Duplicated the `complete_storage` sub and extended it to extract the
targetnode from the parameters to pass it into the storage_check_enabled
function.
since PVE::Cluster::get_local_migration_ip does not exist anymore. this
is basically an inlined version, since this is the only remaining caller
that we actually want to keep.
Oguz Bektas [Mon, 11 Nov 2019 16:29:23 +0000 (17:29 +0100)]
qmreboot: clear reboot request if reboot fails
the reboot request is only cleaned in the vm_start path, so if reboot
fails for some reason, the request still exists. this causes an
unintentional reboot when a shutdown/stop/hibernate is called.
to mitigate, we can just clear the reboot request in case of an error.
Dominik Csapak [Mon, 11 Nov 2019 15:18:45 +0000 (16:18 +0100)]
fix #2457: ga: set-user-password: increase maxLength of password
SHA-512 crypted passwords are longer than 64 byte, and it also does
not make sense to limit passwords to such a short length. Increase
to 1024, that should be enough for a while, but still limits maximal
password payload to avoid DOS or the like.
destroy_vm: allow to pass new config and lock instead
This brings qemu more in line with containers, and it's nicer to
allow passing the replacement config if we want to keep it, instead
of setting a "memory: 128" config.
Use that to lock it on removal before final deletion, and on legacy
tar archive restore, in between old VM destruction and new
restoration.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
destroy_vm: refactor+cleanup and continue on unused disk removal errors
it has some potential semantic change too, i.e., the Storage
vdisk_list call is not wrapped by eval anymore, put as
we did some (unguarded) storage things before that call I'd say that
that does not matters much..
We try to clean all unused disks too, even if one deletion fails
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominic Jäger [Thu, 7 Nov 2019 12:00:57 +0000 (13:00 +0100)]
restore_tar_archive: Add skiplock to destroy_vm
When calling qmrestore a config file is created and locked with a lock
property. The following destroy_vm has been impossible as skiplock has not
been set.
Explicitly close leftover connections in the destructor,
otherwise the IO::Multiplex instance can be leaked causing
the qmp connection to never be closed.
This could occur for instance when cancelling vzdump with
ctrl+c with extremely unlucky timing...
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Thomas Lamprecht [Tue, 29 Oct 2019 18:04:01 +0000 (19:04 +0100)]
cleanup do_import, s/optional/params/ and move skiplock into params
mixed with indentation changes a whole lot of other changes which
should normally not mixed to much together, but this is all a bit
tangled and I'm not sure if splitting it into two or three parts
would help anybody.. just use "-w" (ignore whitespace changes) when
looking at the diff..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominic Jäger [Mon, 28 Oct 2019 11:47:34 +0000 (12:47 +0100)]
Import OVF: Lock config with "lock" property
Previously a VMID conflict was possible when creating a VM on another node
between locking the config with lock_config_full and writing to it for the
first time with write_config.
Using create_and_lock_config eliminates this possibility. This means that now
the "lock" property is set in the config instead of using flock only.
$param was empty when it was assigned the three values "name", "memory" and
"cores" before being assigned to $conf later on. Assigning those values
directly to $conf avoids confusion about what the two variables contain.
Dominic Jäger [Mon, 28 Oct 2019 11:47:32 +0000 (12:47 +0100)]
replace remaining vm_destroy call-sites with destroy_vm
This function has been used in one place only into which we inlined its
functionality. Removing it avoids confusion between vm_destroy and vm_destroy.
The whole $importfn is executed in a lock_config_full.
As a consequence, for the inlined code:
1. lock_config is redundant
2. it is not possible that the VM has been started (check_running) in the
meanwhile
Additionally, it is not possible that the "lock" property has been written into
the VM's config file (check_lock) in the meanwhile
Add warning after eval so that it does not go unnoticed if it ever comes into
action.
Stefan Reiter [Mon, 28 Oct 2019 13:30:41 +0000 (14:30 +0100)]
hugepages: fix memory size checking
The codepath for "any" hugepages did not check if memory size was even,
leading to the code below trying to allocate half a hugepage (e.g. VM
with 2049MiB RAM would lead to 1024.5 2kB hugepages).
Also improve error message for systems with only 1GB hugepages enabled.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Dominik Csapak [Fri, 25 Oct 2019 12:36:06 +0000 (14:36 +0200)]
fix #2434: extend machine regex
with qemu 4.0.1, there is now a machine type pc-q35-4.0.1 which does not fit
into our regex
this broke live migration of q35, as we give the machine type (incl version
info) to 'qm start' on the target node, which checks it against the
JSONSchema
to fix this, extend the regex to allow any number of version levels,
for q35, i440fx and virt (to be more future proof)
Dominik Csapak [Wed, 23 Oct 2019 09:39:53 +0000 (11:39 +0200)]
fix reverting for non-existing configs
reverting a nonexisting option did not work with the latest changes
in pve-guest-common, because we do not delete the pending option
in 'add_to_pending_delete' anymore
this had the effect that we had following in the config:
[pending]
option: pendingvalue
delete: option
which would do the deletion code and the pending add code
(e.g. delete the pending cloud init drive and creating it again)
to avoid that situation, we need to remove the option from the pending hash
in the 'delete loop'
Stefan Reiter [Tue, 22 Oct 2019 15:25:48 +0000 (17:25 +0200)]
fix #2408, #2355, #2380: use scsi-hd backend for iSCSI as well
As mentioned in #2408, live-migrating a VM between storages that use
different scsi backends (scsi-hd, scsi-generic, scsi-block) breaks.
To fix, from QEMU 4.1 machine types onward (to not break current
behaviour any more), only use scsi-hd, as in recent versions, there is
almost no difference between the two anyway.
scsi-block (which potentially also breaks) requires a flag to be
manually set on the disk, so we can assume the user knows what they're
doing.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Suggested-by: Daniel Berteaud <daniel@firewall-services.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Tue, 22 Oct 2019 10:34:27 +0000 (12:34 +0200)]
pending apply/hotplug: don't hard code force to true
Each pending options has a hash value which has the 'force'
information encoded as entry. But, this can be { force => 1 } or
{ force => 0 }, so we actually need to check the value and not just
set force to the hash directly, as else we have force always truthy..
fixes a bug where 'detach' caused disks to be destroyed immediately,
because $force parameter was always true since hash is true.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Mira Limbeck [Fri, 27 Sep 2019 13:13:30 +0000 (15:13 +0200)]
cloudinit: fix vm start hanging with disk on ZFS
With the changes to pve-storage in commit 56362cf the startup hangs for
5 minutes on ZFS if the cloudinit disk does not exist. Instead of
calling activate_volume followed by file_size_info we now call
volume_size_info. This should work reliably on all storages that support
cloudinit disks.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Mira Limbeck [Fri, 27 Sep 2019 14:22:01 +0000 (16:22 +0200)]
fix #2344: ignore cloudinit in replication check
When adding a cloudinit disk it does not contain media=cdrom until it is
actually created. This means the check in check_replication fails to
detect cloudinit and it is recognized as normal disk. Then parse_volname
fails because it does not match the vm-$vmid-XYZ format. To fix this we
now check explicitly if the volname matches cloudinit and if so, return
early.
Additionally 2 small cleanups replacing cloudinit regexes with the
same check for volname matches cloudinit.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Christian Ebner [Tue, 15 Oct 2019 11:00:25 +0000 (13:00 +0200)]
fix #1291: add option purge for vm_destroy api call
When destroying a VM, we intentionally did not remove all related
configs such as backup or replication jobs.
The intention of this flag is to allow the removal of references to
the VM being removed from such configs on destroy.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 18 Oct 2019 09:21:58 +0000 (11:21 +0200)]
destroy_vm: use write_config from our Config module to set an "empty" config
brings us more in line with what we do in pve-container, also it's
good to not use file_set_contents directly if we have all those nice
wrapper interface methods to do things in a safe and guaranteed way.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominic Jäger [Tue, 15 Oct 2019 10:17:41 +0000 (12:17 +0200)]
Fix #2412: Missing VMs in pools
Between calling vm_destroy and removing the ID from user.cfg (remove_vm_access)
creating a new VM with this ID was possible. VMs could go missing from pools as
a consequence.
Adding a lock solves this for clones from the same node. Additionally,
unlinking must happen at the very end of the deletion process to avoid that
other nodes use the ID in the meanwhile.
Thomas Lamprecht [Thu, 17 Oct 2019 17:13:01 +0000 (19:13 +0200)]
Fix #2171: vm_start: volid based statefiles were not activated
So, while we could just make this a special case before the
config_to_command call and set the $conf->{vmstate} to the statefile
for the case were it's a valid volumeid, the special case handling
get's much easier when we do this outside of that method.
So it's basically a trade-off, and after looking far to long at all
nice revisions Alwin made for me and Fabians request, and even trying
out different approaches, it was never perfect.
But having slight code duplication over the movement mess I proposed
(as I did not had the full picture then, sorry Alwin) felt like the
slightly nicer trade off, as all worked I just use this one now, it
has very clear semantics, easy to understand and that now three lines
are duplicated is IMO irrelevant.
Co-developed-by: Alwin Antreich <a.antreich@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stefan Reiter [Thu, 10 Oct 2019 10:18:41 +0000 (12:18 +0200)]
fix #2402: allow 1GB hugepages if 2MB is unavailable
As reported in bug #2402, a system started with "default_hugepagesz=1G
hugepagesz=1G" does not have a /sys/kernel/mm/hugepages/hugepages-2048kB
directory.
To fix, ignore the missing directory in hugepages_mount (since it might
not be needed anyway), and correctly check if the requested hugepage
size is available in hugepages_size instead.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
test: cfg2cmd: do NOT sort expected/actual commands
In general it matters where a command line options is positioned
inside a QEMU command, so we want to actually also check the order in
the cfg2cmd test
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
cfg2cmd: sort PCI bridges when adding them for stabillity
In general it matters where a command line options is positioned
inside a QEMU command, so we want to actually also check the order in
the cfg2cmd test, to do so we need to avoid false positives like this
added.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Aaron Lauterer [Tue, 8 Oct 2019 15:56:15 +0000 (17:56 +0200)]
cfg2cmd: fix serial-bus for spice foldersharing
Thanks to Gilberto Nunes for finding a bug where the VM would not start
with foldersharing enabled and the qemu agent option disabled [0].
The cause was that the device org.spice-space.webdav.0 would not find a
virtio-serial-bus in this situation.
Since we always create a virtio-serial-bus for the spice vdagent it
seems sensible to use that also for the foldersharing device by moving
it in front of the other spice devices.
Mira Limbeck [Wed, 25 Sep 2019 16:12:17 +0000 (18:12 +0200)]
fix #2217: don't copy cloudinit disk on clone
This removes the cloudinit disk from the list of drives to clone. As the
cloudinit disk is recreated on every VM start, it's not necessary to
clone it.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Thomas Lamprecht [Thu, 26 Sep 2019 08:54:05 +0000 (10:54 +0200)]
cfg2cmd: support USB 3 SPICE ports with 4.0 machine feature
The reason for why we did not do this in the first place was the fact
that the "usb3" flag could be set in older qemu-server versions, we
just ignored it but not filtered it out of the config..
That means there can be VMs out there which would now become a
different HW layout, and issue for migration and live-snapshot
restore.
But, actually, while the "usb3" property could be set it allowed to
start the VM in only if an additional USB devices was added to the VM
with USB2, or the VM uses "q35" based machine - as else no "ehci" was
available, and thus the "ignored" USB3 - SPICE could not get attached
anywhere -> QEMU chickened out.
And if a user had a configuration where this could started we have
still a bit luck, live-migration was not possible as the "can't
migrate VM which uses local devices:" check still hit, as in
qemu-server older than 6.0-8 we explicitly checked for "spice" when
seeing what usb device were not local, so a "spice,usb3=X" was always
(luckily) wrongly detected as local device -> migration was blocked.
So we only have one case left: restoring a live-snapshot. Here sadly
there seems no way out, it was possible to do with a "spice,usb3=1"
usb device, and thus all Snapshots taken on such VMs after they had a
clean restart on PVE 6 (to have a machine version >= 4.0) are broken
- but can be easily fixed by removing the "usb3=1" from the
problematic snapshot config.
As restoring a snapshot can be repeated more than once even on
failure without rendering the snapshot or VM permanently unusable,
this should be a reasonable compromise.
I strongly believe that the chance is so small that no one is
affected in practice and the property description mentioned that it
was not supported. If anybody is affected on snapshot restore we can
help them on a case-per-case basis.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>