Fiona Ebner [Wed, 14 Dec 2022 14:16:32 +0000 (15:16 +0100)]
update submodule and patches to 7.2.0
User-facing breaking change:
The slirp submodule for user networking got removed. It would be
necessary to add the --enable-slirp option to the build and/or install
the appropriate library to continue building it. Since PVE is not
explicitly supporting it, it would require additionally installing the
libslirp0 package on all installations and there is *very* little
mention on the community forum when searching for "slirp" or
"netdev user", the plan is to only enable it again if there is some
real demand for it.
Notable changes:
* The big change for this release is the rework of job locking, using
a job mutex and introducing _locked() variants of job API functions
moving away from call-side AioContext locking. See (in the qemu
submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and
remove Aiocontext locks") and previous commits for context.
Changes required for the backup patches:
* Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job
API functions where appropriate (many are only availalbe as
a _locked() variant).
* Remove acquiring/releasing AioContext around functions taking the
job mutex lock internally.
The patch introducing sequential transaction support for jobs needs
to temporarily unlock the job mutex to call job_start() when
starting the next job in the transaction.
* The zeroinit block driver now marks its child as primary.
The documentation in include/block/block-common.h states:
> Filter node has exactly one FILTERED|PRIMARY child, and may have
> other children which must not have these bits
Without this, an assert will trigger when copying to a zeroinit target
with qemu-img convert, because bdrv_child_cb_attach() expects any
non-PRIMARY child to be not FILTERED:
> qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw
> qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion
> `!(child->role & BDRV_CHILD_FILTERED)' failed.
Thomas Lamprecht [Tue, 22 Nov 2022 08:18:56 +0000 (09:18 +0100)]
cherry-pick "block/block-backend: blk_set_enable_write_cache is IO_CODE"
albeit I was short from disarming that GLOBAL_STATE_CODE assert
completely, as its just bogus to assert that on runtime for a lot of
call sites, rather it should be verified on compilation (function
coloring with attributes and maybe a compiler plugin).
But, as this is already solved upstream lets take in that patch.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 28 Oct 2022 08:22:21 +0000 (10:22 +0200)]
init: daemonize: defuse PID file resolve error to warning
fixes file restore, where we actively unlink the PID file of the
transient VM ourself after opening it - while we use it only for
tracking when the QEMU process itself has finished start up, it's
easier and cleaner to fix this regression now, than to rework that to
something that doesn't depends on the PID file at all.
Applying Fiona's patch as patch-patch tracked under extra, as I
expect that something similar to this gets accepted upstreamed.
Fiona Ebner [Mon, 17 Oct 2022 07:18:33 +0000 (09:18 +0200)]
savevm async IO channel: channel writev: fix return value in error case
The documentation in include/io/channel.h states that -1 or
QIO_CHANNEL_ERR_BLOCK should be returned upon error. Simply passing
along the return value from the blk-functions has the potential to
confuse the call sides. Non-blocking mode is not implemented
currently, so -1 it is.
The "return ret" was mistakenly left over from the previous
QEMUFileOps based implementation. Also, use error_setg_errno(), since
the blk(_co)_p{readv,writev} functions return errno codes.
to be in-line with what other implementations in QEMU do. Commit 1d39c7098bbfa6862cb96066c4f8f6735ea397c5 mentions the EIO bit and
the function is expected to return 0 upon success (see other
implementations).
Fiona Ebner [Fri, 14 Oct 2022 12:07:13 +0000 (14:07 +0200)]
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
This version string can be queried with $BINARY --version as well as
the query-version QMP command.
Useful for qemu-server to be able to report the running QEMU version
exactly. Could also be used to version guard against features as an
alternative to the query-proxmox-support QMP command.
Fiona Ebner [Thu, 18 Aug 2022 11:44:17 +0000 (13:44 +0200)]
savevm-async: set SAVE_STATE_DONE when closing state file was successful
Without this change, it's necessary to send a second savevm-end QMP
command after aborting a snaphsot, before a new savevm-start QMP
command can succeed.
In process_savevm_finalize(), no longer set an error in the abort
scenario. If there already is another error, there's no need to
override it. If canceling was done intentionally, qmp_savevm_end()
is responsible for setting the state now.
Reported-by: Mira Limbeck <m.limbeck@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Fiona Ebner [Thu, 18 Aug 2022 11:44:16 +0000 (13:44 +0200)]
savevm-async: avoid segfault when aborting snapshot
Reported in the community forum[0].
For 6.1.0, there were a few changes to the coroutine-sleep API, but
the adaptations in f376b2b ("update and rebase to QEMU v6.1.0") made
a mistake.
Currently, target_close_wait is NULL when passed to
qemu_co_sleep_ns_wakeable(), which further passes it to
qemu_co_sleep(), but there, it is dereferenced when trying to access
the 'to_wake' member:
> Thread 1 "kvm" received signal SIGSEGV, Segmentation fault.
> qemu_co_sleep (w=0x0) at ../util/qemu-coroutine-sleep.c:57
To fix it, create a proper struct and pass its address instead. Also
call qemu_co_sleep_wake unconditionally, because the NULL check (for
the 'to_wake' member) is done inside the function itself.
This patch is based on what the QEMU commits introducing the changes
to the coroutine-sleep API did to the callers in QEMU: eaee072085 ("coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing") 29a6ea24eb ("coroutine-sleep: replace QemuCoSleepState pointer with struct in the API")
[0]: https://forum.proxmox.com/threads/112130/
Tested-by: Mira Limbeck <m.limbeck@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
For the io_uring patch, it's not very clear which configurations can
trigger it, but it should be rather uncommon. See qemu commit be6a166fde652589761cf70471bcde623e9bd72a for a bit more information.
Fabian Ebner [Mon, 27 Jun 2022 11:05:40 +0000 (13:05 +0200)]
update submodule and patches to 7.0.0
Only very minor changes needed:
* Most patches in extra (or some version of them) are part of 7.0.0.
* aio_set_fd_handler got an extra parameter, but can just pass NULL
like we did for the related 'poll' parameter. See QEMU commit 826cc32423db2a99d184dbf4f507c737d7e7a4ae for more.
* Add include for qemu/memalign.h in vma.c and vma-writer.c.
* Add reverts for fixups of already reverted 0347a8fd4c ("block/rbd:
implement bdrv_co_block_status") that came in with 7.0.0. Those
fixups are not enough, see Proxmox bugzilla #4047.
* Two trivial context changes for bitmap-mirror patches.
* block_int.h got split up into multiple headers.
* Some context changes in configure and meson.build.
* Used the oppurtunity to squash fixup of bdrv_backuo_dump_create typo
in a later patch into the patch introducing the function (had to
move code to new header during rebase).
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Fabian Ebner [Tue, 17 May 2022 08:29:49 +0000 (10:29 +0200)]
add revert to work around performance regression when backing up large RBD disk
resulting in QMP timeouts and very slow backups. The plan is to figure
out (ideally together with upstream) a way to make the implementation
of bdrv_co_block_status for RBD more efficient. But for now, revert
the problematic change as a stop-gap measure.
Thomas Lamprecht [Mon, 25 Apr 2022 08:07:24 +0000 (10:07 +0200)]
vma: allow partial restore
Introduce a new map line for skipping a certain drive, of the form
skip=drive-scsi0
Since in PVE, most archives are compressed and piped to vma for
restore, it's not easily possible to skip reads.
For the reader, a new skip flag for VmaRestoreState is added and the
target is allowed to be NULL if skip is specified when registering.
If
the skip flag is set, no writes will be made as well as no check for
duplicate clusters. Therefore, the flag is not set for verify.
Originally-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 22 Apr 2022 07:09:04 +0000 (09:09 +0200)]
d/control: add suggest dependency-hint for libgl1
It pulls in a lot of stuff via the libglx0 -> libglx-mesa0 dependency
chain, so only suggest it for now to avoid installing it in the
installer or via common "PVE on-top Debian" installations, VirGL
integration is experimental after all and we may drop/replace it with
the vulkan based venus one, once available (Debian 12?).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Wed, 2 Mar 2022 09:05:16 +0000 (10:05 +0100)]
backup: add patch to initialize bcs bitmap early enough for PBS
This is necessary for multi-disk backups where not all jobs are
immediately started after they are created. QEMU commit 06e0a9c16405c0a4c1eca33cf286cc04c42066a2 did already part of the work,
ensuring that new writes after job creation don't pass through to the
backup, but not yet for the MIRROR_SYNC_MODE_BITMAP case which is used
for PBS.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Fabian Ebner [Fri, 11 Feb 2022 09:24:33 +0000 (10:24 +0100)]
update submodule and patches to 6.2.0
Notable changes:
* bdrv_co_p{discard,readv,writev,write_zeroes} function signatures
changed, to using int64_t for offsets/bytes and some still had int
rather than BrdvRequestFlags for the flags.
* job_cancel_sync now has a force parameter. Commit messages in 73895f3838cd7fdaf185cf1dbc47be58844a966f 4cfb3f05627ad82af473e7f7ae113c3884cd04e3
sound like using force=true makes more sense.
* Added 3 patches coming in via qemu-stable tag, most important one is
to work around a librbd issue.
* Added another 3 patches from qemu-devel to fix issue leading to
crash when live migrating with iothread.
* cluster_size calculation helper changed (see patch pve/0026).
* QAPI's if conditionals now use 'CONFIG_FOO' rather than
'defined(CONFIG_FOO)'
Dominik Csapak [Thu, 18 Nov 2021 07:05:38 +0000 (08:05 +0100)]
buildsys: fix build-dependencies on headers for 'vma' and 'pbs_restore'
both of them depend on generated header files, so we have to specify
them as sources. Otherwise, it happens (at least on some machines)
that they will be compiled before the headers are generated, aborting
the build.
libguestfs starts their helper VMs with `-machine accel=..` without a
machine type, and our pve version suffix handling would segfault in that
case. there might be other scripted use cases that are affected as well.
this regression was introduced with the rebase of our patch set on top
of 6.1.0
Stefan Reiter [Mon, 11 Oct 2021 11:55:34 +0000 (13:55 +0200)]
update and rebase to QEMU v6.1.0
Very clean rebase, only the +pve version handling needed manual fixing.
Drops two applied patches from extra/ and adds one new from upstream
(extra/0001*, fixes VNC over unix sockets) as well as 3 of my own for
allowing password changes on custom VNC displays again (as seen and
reviewed upstream, but not yet applied).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Wed, 1 Sep 2021 16:10:12 +0000 (18:10 +0200)]
add temporary QMP race fix
same as the initial version sent to qemu-devel, it won't be the final
fix we plan to upstream but it should be enough band-aid to
workaround how PVE uses the QMP.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[ Thomas: add a bit reasoning to commit message body ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This drops debian/patches/pve/0005-PVE-Config-smm_available-false.patch
(and renumbers the remaining patches)
From what I could gather, this patch was originally added
due to issues with old kernels. Now we have users which
seem to run into issues *with* the patch.
All this does is toggle an option, and it's available via a
qemu CLI option anyway, so if dropping this patch causes
issues for some people we can just add an option to
qemu-server & UI control smm explicitly.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Cc: Alexandre Derumier <aderumier@odiso.com> Tested-by: Stefan Reiter <s.reiter@proxmox.com>
Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.
Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.
Stefan Reiter [Thu, 27 May 2021 10:43:32 +0000 (12:43 +0200)]
udpate and rebase to QEMU v6.0.0
Mostly minor changes, bigger ones summarized:
* QEMU's internal backup code now uses a new async system, which allows
parallel requests - the default max_workers settings is 64, I chose
less, since 64 put enough stress on QEMU that the guest became
practically unusable during the backup, and 16 still shows quite a
nice measureable performance improvement. Little code changes for us
though.
* 'malformed' QAPI parameters/functions are now a build error (i.e.
using '_' vs '-'), I chose to just whitelist our calls in the name of
backwards compatibility.
* monitor OOB race fix now uses the upstream variant, cherry-picked from
origin/master since it's not in 6.0 by default
* last patch fixes a bug with snapshot rollback related to the new yank
system
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
alloc track: acquire BS AIO context during dropping
ran into this when live-restoring a backup configured for IO-threads,
got the good ol':
> qemu: qemu_mutex_unlock_impl: Operation not permitted
error.
Checking out the history of the related bdrv_backup_top_drop(*bs)
method, we can see that it used to do the AIO context acquiring too,
but in the backup path this was problematic and was changed to be
higher up in the call path in a upstream series from Stefan[0].
That said, this is a completely different code path and it is safe to
do so here. We always run from the main threads's AIO context here
and we call it only indirectly once, guarded by checking for
`s->drop_state == DropNone` and set `s->drop_state = DropRequested`
shortly before we schedule the track_drop() in a bh.
alloc track: keep track_drop() closer to similar block drivers
Reads just nicer with a drain begin *and* end call. Also clearing the
backing link of the alloc track BDS makes it closer to
bdrv_backup_top_drop() with which this driver has a bit in common.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stefan Reiter [Tue, 30 Mar 2021 15:59:51 +0000 (17:59 +0200)]
add upstream fixes for qmp_block_resize
cherry-picked cleanly from 6.0 development tree, fixes an issue with
resizing RBD drives (and reportedly also on krbd or potentially other
storage backends) with iothreads.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>