]> git.proxmox.com Git - pve-qemu.git/commit
add patch to work around stuck guest IO with iothread and VirtIO block/SCSI
authorFiona Ebner <f.ebner@proxmox.com>
Mon, 11 Dec 2023 13:28:38 +0000 (14:28 +0100)
committerThomas Lamprecht <t.lamprecht@proxmox.com>
Mon, 11 Dec 2023 15:56:50 +0000 (16:56 +0100)
commit6b7c1815e1c89cb66ff48fbba6da69fe6d254630
tree8ae80dbfac760431a897f55a9c13ba112d989196
parent24d732ac0f4ab613ba6e6f77a34bea0742bdcf3b
add patch to work around stuck guest IO with iothread and VirtIO block/SCSI

When using iothread, after commits
1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()")
it can happen that polling gets stuck when draining. This would cause
IO in the guest to get completely stuck.

A workaround for users is stopping and resuming the vCPUs because that
would also stop and resume the dataplanes which would kick the host
notifiers.

This can happen with block jobs like backup and drive mirror as well
as with hotplug [2].

Reports in the community forum that might be about this issue[0][1]
and there is also one in the enterprise support channel.

As a workaround in the code, just re-enable notifications and kick the
virt queue after draining. Draining is already costly and rare, so no
need to worry about a performance penalty here. This was taken from
the following comment of a QEMU developer [3] (in my debugging,
I had already found re-enabling notification to work around the issue,
but also kicking the queue is more complete).

[0]: https://forum.proxmox.com/threads/137286/
[1]: https://forum.proxmox.com/threads/137536/
[2]: https://issues.redhat.com/browse/RHEL-3934
[3]: https://issues.redhat.com/browse/RHEL-3934?focusedId=23562096&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-23562096

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
debian/patches/pve/0046-virtio-blk-scsi-work-around-iothread-polling-getting.patch [new file with mode: 0644]
debian/patches/series