]> git.proxmox.com Git - pve-qemu.git/commit - debian/patches/series
work around stuck guest IO with iothread and VirtIO block/SCSI
authorThomas Lamprecht <t.lamprecht@proxmox.com>
Fri, 2 Feb 2024 18:35:31 +0000 (19:35 +0100)
committerThomas Lamprecht <t.lamprecht@proxmox.com>
Fri, 2 Feb 2024 18:35:34 +0000 (19:35 +0100)
commit4ff04bdfa5f7f23cd2466789df1bf71559e84203
treead90931c17ab578a706d2b8f79d6673a64824b09
parent12b69ed9c5d919cbd0805fcf01a82ef094cf3a6c
work around stuck guest IO with iothread and VirtIO block/SCSI

This essentially repeats commit 6b7c181 ("add patch to work around
stuck guest IO with iothread and VirtIO block/SCSI") with an added
fix for the SCSI event virtqueue, which requires special handling.
This is to avoid the issue [3] that made the revert 2a49e66 ("Revert
"add patch to work around stuck guest IO with iothread and VirtIO
block/SCSI"") necessary the first time around.

When using iothread, after commits
1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()")
it can happen that polling gets stuck when draining. This would cause
IO in the guest to get completely stuck.

A workaround for users is stopping and resuming the vCPUs because that
would also stop and resume the dataplanes which would kick the host
notifiers.

This can happen with block jobs like backup and drive mirror as well
as with hotplug [2].

Reports in the community forum that might be about this issue[0][1]
and there is also one in the enterprise support channel.

As a workaround in the code, just re-enable notifications and kick the
virt queue after draining. Draining is already costly and rare, so no
need to worry about a performance penalty here.

Take special care to attach the SCSI event virtqueue host notifier
with the _no_poll() variant like in virtio_scsi_dataplane_start().
This avoids the issue from the first attempted fix where the iothread
would suddenly loop with 100% CPU usage whenever some guest IO came in
[3]. This is necessary because of commit 38738f7dbb ("virtio-scsi:
don't waste CPU polling the event virtqueue"). See [4] for the
relevant discussion.

[0]: https://forum.proxmox.com/threads/137286/
[1]: https://forum.proxmox.com/threads/137536/
[2]: https://issues.redhat.com/browse/RHEL-3934
[3]: https://forum.proxmox.com/threads/138140/
[4]: https://lore.kernel.org/qemu-devel/bfc7b20c-2144-46e9-acbc-e726276c5a31@proxmox.com/

Link: https://lore.kernel.org/qemu-devel/20240202153158.788922-1-hreitz@redhat.com/
Originally-by: Fiona Ebner <f.ebner@proxmox.com>
 [ TL: Update to v2 and rebased patch series handling to v8.1.5 ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
debian/patches/extra/0010-virtio-scsi-Attach-event-vq-notifier-with-no_poll.patch [new file with mode: 0644]
debian/patches/extra/0011-virtio-Re-enable-notifications-after-drain.patch [new file with mode: 0644]
debian/patches/series