8 Copyright 2014 Virtual Open Systems Sarl.
9 Copyright 2019 Intel Corporation
10 Licence: This work is licensed under the terms of the GNU GPL,
11 version 2 or later. See the COPYING file in the top-level
14 .. contents:: Table of Contents
19 This protocol is aiming to complement the ``ioctl`` interface used to
20 control the vhost implementation in the Linux kernel. It implements
21 the control plane needed to establish virtqueue sharing with a user
22 space process on the same host. It uses communication over a Unix
23 domain socket to share file descriptors in the ancillary data of the
26 The protocol defines 2 sides of the communication, *master* and
27 *slave*. *Master* is the application that shares its virtqueues, in
28 our case QEMU. *Slave* is the consumer of the virtqueues.
30 In the current implementation QEMU is the *master*, and the *slave* is
31 the external process consuming the virtio queues, for example a
32 software Ethernet switch running in user space, such as Snabbswitch,
33 or a block device backend processing read & write to a virtual
34 disk. In order to facilitate interoperability between various backend
35 implementations, it is recommended to follow the :ref:`Backend program
36 conventions <backend_conventions>`.
38 *Master* and *slave* can be either a client (i.e. connecting) or
39 server (listening) in the socket communication.
41 Support for platforms other than Linux
42 --------------------------------------
44 While vhost-user was initially developed targeting Linux, nowadays it
45 is supported on any platform that provides the following features:
47 - A way for requesting shared memory represented by a file descriptor
48 so it can be passed over a UNIX domain socket and then mapped by the
51 - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
52 exchange messages through it, including ancillary data when needed.
54 - Either eventfd or pipe/pipe2. On platforms where eventfd is not
55 available, QEMU will automatically fall back to pipe2 or, as a last
56 resort, pipe. Each file descriptor will be used for receiving or
57 sending events by reading or writing (respectively) an 8-byte value
58 to the corresponding it. The 8-value itself has no meaning and
59 should not be interpreted.
64 .. Note:: All numbers are in the machine native byte order.
66 A vhost-user message consists of 3 header fields and a payload.
68 +---------+-------+------+---------+
69 | request | flags | size | payload |
70 +---------+-------+------+---------+
75 :request: 32-bit type of the request
77 :flags: 32-bit bit field
79 - Lower 2 bits are the version (currently 0x01)
80 - Bit 2 is the reply flag - needs to be sent on each reply from the slave
81 - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
84 :size: 32-bit size of the payload
89 Depending on the request type, **payload** can be:
91 A single 64-bit integer
92 ^^^^^^^^^^^^^^^^^^^^^^^
98 :u64: a 64-bit unsigned integer
100 A vring state description
101 ^^^^^^^^^^^^^^^^^^^^^^^^^
107 :index: a 32-bit index
109 :num: a 32-bit number
111 A vring address description
112 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
114 +-------+-------+------+------------+------+-----------+-----+
115 | index | flags | size | descriptor | used | available | log |
116 +-------+-------+------+------------+------+-----------+-----+
118 :index: a 32-bit vring index
120 :flags: a 32-bit vring flags
122 :descriptor: a 64-bit ring address of the vring descriptor table
124 :used: a 64-bit ring address of the vring used ring
126 :available: a 64-bit ring address of the vring available ring
128 :log: a 64-bit guest address for logging
130 Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
131 been negotiated. Otherwise it is a user address.
133 Memory regions description
134 ^^^^^^^^^^^^^^^^^^^^^^^^^^
136 +-------------+---------+---------+-----+---------+
137 | num regions | padding | region0 | ... | region7 |
138 +-------------+---------+---------+-----+---------+
140 :num regions: a 32-bit number of regions
146 +---------------+------+--------------+-------------+
147 | guest address | size | user address | mmap offset |
148 +---------------+------+--------------+-------------+
150 :guest address: a 64-bit guest address of the region
154 :user address: a 64-bit user address
156 :mmap offset: 64-bit offset where region starts in the mapped memory
158 Single memory region description
159 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
161 +---------+---------------+------+--------------+-------------+
162 | padding | guest address | size | user address | mmap offset |
163 +---------+---------------+------+--------------+-------------+
167 :guest address: a 64-bit guest address of the region
171 :user address: a 64-bit user address
173 :mmap offset: 64-bit offset where region starts in the mapped memory
178 +----------+------------+
179 | log size | log offset |
180 +----------+------------+
182 :log size: size of area used for logging
184 :log offset: offset from start of supplied file descriptor where
185 logging starts (i.e. where guest address 0 would be
191 +------+------+--------------+-------------------+------+
192 | iova | size | user address | permissions flags | type |
193 +------+------+--------------+-------------------+------+
195 :iova: a 64-bit I/O virtual address programmed by the guest
199 :user address: a 64-bit user address
201 :permissions flags: an 8-bit value:
205 - 3: Read/Write access
207 :type: an 8-bit IOTLB message type:
210 - 3: IOTLB invalidate
211 - 4: IOTLB access fail
213 Virtio device config space
214 ^^^^^^^^^^^^^^^^^^^^^^^^^^
216 +--------+------+-------+---------+
217 | offset | size | flags | payload |
218 +--------+------+-------+---------+
220 :offset: a 32-bit offset of virtio device's configuration space
222 :size: a 32-bit configuration space access size in bytes
224 :flags: a 32-bit value:
225 - 0: Vhost master messages used for writeable fields
226 - 1: Vhost master messages used for live migration
228 :payload: Size bytes array holding the contents of the virtio
229 device's configuration space
231 Vring area description
232 ^^^^^^^^^^^^^^^^^^^^^^
234 +-----+------+--------+
235 | u64 | size | offset |
236 +-----+------+--------+
238 :u64: a 64-bit integer contains vring index and flags
240 :size: a 64-bit size of this area
242 :offset: a 64-bit offset of this area from the start of the
243 supplied file descriptor
248 +-----------+-------------+------------+------------+
249 | mmap size | mmap offset | num queues | queue size |
250 +-----------+-------------+------------+------------+
252 :mmap size: a 64-bit size of area to track inflight I/O
254 :mmap offset: a 64-bit offset of this area from the start
255 of the supplied file descriptor
257 :num queues: a 16-bit number of virtqueues
259 :queue size: a 16-bit size of virtqueues
264 In QEMU the vhost-user message is implemented with the following struct:
268 typedef struct VhostUserMsg {
269 VhostUserRequest request;
274 struct vhost_vring_state state;
275 struct vhost_vring_addr addr;
276 VhostUserMemory memory;
278 struct vhost_iotlb_msg iotlb;
279 VhostUserConfig config;
280 VhostUserVringArea area;
281 VhostUserInflight inflight;
283 } QEMU_PACKED VhostUserMsg;
288 The protocol for vhost-user is based on the existing implementation of
289 vhost for the Linux Kernel. Most messages that can be sent via the
290 Unix domain socket implementing vhost-user have an equivalent ioctl to
291 the kernel implementation.
293 The communication consists of *master* sending message requests and
294 *slave* sending message replies. Most of the requests don't require
295 replies. Here is a list of the ones that do:
297 * ``VHOST_USER_GET_FEATURES``
298 * ``VHOST_USER_GET_PROTOCOL_FEATURES``
299 * ``VHOST_USER_GET_VRING_BASE``
300 * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
301 * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
305 :ref:`REPLY_ACK <reply_ack>`
306 The section on ``REPLY_ACK`` protocol extension.
308 There are several messages that the master sends with file descriptors passed
309 in the ancillary data:
311 * ``VHOST_USER_SET_MEM_TABLE``
312 * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
313 * ``VHOST_USER_SET_LOG_FD``
314 * ``VHOST_USER_SET_VRING_KICK``
315 * ``VHOST_USER_SET_VRING_CALL``
316 * ``VHOST_USER_SET_VRING_ERR``
317 * ``VHOST_USER_SET_SLAVE_REQ_FD``
318 * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
320 If *master* is unable to send the full message or receives a wrong
321 reply it will close the connection. An optional reconnection mechanism
324 If *slave* detects some error such as incompatible features, it may also
325 close the connection. This should only happen in exceptional circumstances.
327 Any protocol extensions are gated by protocol feature bits, which
328 allows full backwards compatibility on both master and slave. As
329 older slaves don't support negotiating protocol features, a feature
330 bit was dedicated for this purpose::
332 #define VHOST_USER_F_PROTOCOL_FEATURES 30
334 Starting and stopping rings
335 ---------------------------
337 Client must only process each ring when it is started.
339 Client must only pass data between the ring and the backend, when the
342 If ring is started but disabled, client must process the ring without
343 talking to the backend.
345 For example, for a networking device, in the disabled state client
346 must not supply any new RX packets, but must process and discard any
349 If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
350 ring is initialized in an enabled state.
352 If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
353 initialized in a disabled state. Client must not pass data to/from the
354 backend until ring is enabled by ``VHOST_USER_SET_VRING_ENABLE`` with
355 parameter 1, or after it has been disabled by
356 ``VHOST_USER_SET_VRING_ENABLE`` with parameter 0.
358 Each ring is initialized in a stopped state, client must not process
359 it until ring is started, or after it has been stopped.
361 Client must start ring upon receiving a kick (that is, detecting that
362 file descriptor is readable) on the descriptor specified by
363 ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message
364 ``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving
365 ``VHOST_USER_GET_VRING_BASE``.
367 While processing the rings (whether they are enabled or not), client
368 must support changing some configuration aspects on the fly.
370 Multiple queue support
371 ----------------------
373 Many devices have a fixed number of virtqueues. In this case the master
374 already knows the number of available virtqueues without communicating with the
377 Some devices do not have a fixed number of virtqueues. Instead the maximum
378 number of virtqueues is chosen by the slave. The number can depend on host
379 resource availability or slave implementation details. Such devices are called
380 multiple queue devices.
382 Multiple queue support allows the slave to advertise the maximum number of
383 queues. This is treated as a protocol extension, hence the slave has to
384 implement protocol features first. The multiple queues feature is supported
385 only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
387 The max number of queues the slave supports can be queried with message
388 ``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested
389 queues is bigger than that.
391 As all queues share one connection, the master uses a unique index for each
392 queue in the sent message to identify a specified queue.
394 The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
395 vhost-user-net has historically automatically enabled the first queue pair.
397 Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
398 feature, even for devices with a fixed number of virtqueues, since it is simple
399 to implement and offers a degree of introspection.
401 Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
402 devices with a fixed number of virtqueues. Only true multiqueue devices
403 require this protocol feature.
408 During live migration, the master may need to track the modifications
409 the slave makes to the memory mapped regions. The client should mark
410 the dirty pages in a log. Once it complies to this logging, it may
411 declare the ``VHOST_F_LOG_ALL`` vhost feature.
413 To start/stop logging of data/used ring writes, server may send
414 messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and
415 ``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's
416 flags set to 1/0, respectively.
418 All the modifications to memory pointed by vring "descriptor" should
419 be marked. Modifications to "used" vring should be marked if
420 ``VHOST_VRING_F_LOG`` is part of ring's flags.
422 Dirty pages are of size::
424 #define VHOST_LOG_PAGE 0x1000
426 The log memory fd is provided in the ancillary data of
427 ``VHOST_USER_SET_LOG_BASE`` message when the slave has
428 ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature.
430 The size of the log is supplied as part of ``VhostUserMsg`` which
431 should be large enough to cover all known guest addresses. Log starts
432 at the supplied offset in the supplied file descriptor. The log
433 covers from address 0 to the maximum of guest regions. In pseudo-code,
434 to mark page at ``addr`` as dirty::
436 page = addr / VHOST_LOG_PAGE
437 log[page / 8] |= 1 << page % 8
439 Where ``addr`` is the guest physical address.
441 Use atomic operations, as the log may be concurrently manipulated.
443 Note that when logging modifications to the used ring (when
444 ``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should
445 be used to calculate the log offset: the write to first byte of the
446 used ring is logged at this offset from log start. Also note that this
447 value might be outside the legal guest physical address range
448 (i.e. does not have to be covered by the ``VhostUserMemory`` table), but
449 the bit offset of the last byte of the ring must fall within the size
450 supplied by ``VhostUserLog``.
452 ``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in
453 ancillary data, it may be used to inform the master that the log has
456 Once the source has finished migration, rings will be stopped by the
457 source. No further update must be done before rings are restarted.
459 In postcopy migration the slave is started before all the memory has
460 been received from the source host, and care must be taken to avoid
461 accessing pages that have yet to be received. The slave opens a
462 'userfault'-fd and registers the memory with it; this fd is then
463 passed back over to the master. The master services requests on the
464 userfaultfd for pages that are accessed and when the page is available
465 it performs WAKE ioctl's on the userfaultfd to wake the stalled
466 slave. The client indicates support for this via the
467 ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
472 The master sends a list of vhost memory regions to the slave using the
473 ``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base
474 addresses: a guest address and a user address.
476 Messages contain guest addresses and/or user addresses to reference locations
477 within the shared memory. The mapping of these addresses works as follows.
479 User addresses map to the vhost memory region containing that user address.
481 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
483 * Guest addresses map to the vhost memory region containing that guest
486 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
488 * Guest addresses are also called I/O virtual addresses (IOVAs). They are
489 translated to user addresses via the IOTLB.
491 * The vhost memory region guest address is not used.
496 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
497 master sends IOTLB entries update & invalidation by sending
498 ``VHOST_USER_IOTLB_MSG`` requests to the slave with a ``struct
499 vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
500 has to be filled with the update message type (2), the I/O virtual
501 address, the size, the user virtual address, and the permissions
502 flags. Addresses and size must be within vhost memory regions set via
503 the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
504 ``iotlb`` payload has to be filled with the invalidation message type
505 (3), the I/O virtual address and the size. On success, the slave is
506 expected to reply with a zero payload, non-zero otherwise.
508 The slave relies on the slave communication channel (see :ref:`Slave
509 communication <slave_communication>` section below) to send IOTLB miss
510 and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG``
511 requests to the master with a ``struct vhost_iotlb_msg`` as
512 payload. For miss events, the iotlb payload has to be filled with the
513 miss message type (1), the I/O virtual address and the permissions
514 flags. For access failure event, the iotlb payload has to be filled
515 with the access failure message type (4), the I/O virtual address and
516 the permissions flags. For synchronization purpose, the slave may
517 rely on the reply-ack feature, so the master may send a reply when
518 operation is completed if the reply-ack feature is negotiated and
519 slaves requests a reply. For miss events, completed operation means
520 either master sent an update message containing the IOTLB entry
521 containing requested address and permission, or master sent nothing if
522 the IOTLB miss message is invalid (invalid IOVA or permission).
524 The master isn't expected to take the initiative to send IOTLB update
525 messages, as the slave sends IOTLB miss messages for the guest virtual
526 memory areas it needs to access.
528 .. _slave_communication:
533 An optional communication channel is provided if the slave declares
534 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the
535 slave to make requests to the master.
537 The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data.
539 A slave may then send ``VHOST_USER_SLAVE_*`` messages to the master
540 using this fd communication channel.
542 If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is
543 negotiated, slave can send file descriptors (at most 8 descriptors in
544 each message) to master via ancillary data using this fd communication
547 Inflight I/O tracking
548 ---------------------
550 To support reconnecting after restart or crash, slave may need to
551 resubmit inflight I/Os. If virtqueue is processed in order, we can
552 easily achieve that by getting the inflight descriptors from
553 descriptor table (split virtqueue) or descriptor ring (packed
554 virtqueue). However, it can't work when we process descriptors
555 out-of-order because some entries which store the information of
556 inflight descriptors in available ring (split virtqueue) or descriptor
557 ring (packed virtqueue) might be overridden by new entries. To solve
558 this problem, slave need to allocate an extra buffer to store this
559 information of inflight descriptors and share it with master for
560 persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and
561 ``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer
562 between master and slave. And the format of this buffer is described
565 +---------------+---------------+-----+---------------+
566 | queue0 region | queue1 region | ... | queueN region |
567 +---------------+---------------+-----+---------------+
569 N is the number of available virtqueues. Slave could get it from num
570 queues field of ``VhostUserInflight``.
572 For split virtqueue, queue region can be implemented as:
576 typedef struct DescStateSplit {
577 /* Indicate whether this descriptor is inflight or not.
578 * Only available for head-descriptor. */
584 /* Maintain a list for the last batch of used descriptors.
585 * Only available when batching is used for submitting */
588 /* Used to preserve the order of fetching available descriptors.
589 * Only available for head-descriptor. */
593 typedef struct QueueRegionSplit {
594 /* The feature flags of this region. Now it's initialized to 0. */
597 /* The version of this region. It's 1 currently.
598 * Zero value indicates an uninitialized buffer */
601 /* The size of DescStateSplit array. It's equal to the virtqueue
602 * size. Slave could get it from queue size field of VhostUserInflight. */
605 /* The head of list that track the last batch of used descriptors. */
606 uint16_t last_batch_head;
608 /* Store the idx value of used ring */
611 /* Used to track the state of each descriptor in descriptor table */
612 DescStateSplit desc[];
615 To track inflight I/O, the queue region should be processed as follows:
617 When receiving available buffers from the driver:
619 #. Get the next available head-descriptor index from available ring, ``i``
621 #. Set ``desc[i].counter`` to the value of global counter
623 #. Increase global counter by 1
625 #. Set ``desc[i].inflight`` to 1
627 When supplying used buffers to the driver:
629 1. Get corresponding used head-descriptor index, i
631 2. Set ``desc[i].next`` to ``last_batch_head``
633 3. Set ``last_batch_head`` to ``i``
635 #. Steps 1,2,3 may be performed repeatedly if batching is possible
637 #. Increase the ``idx`` value of used ring by the size of the batch
639 #. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
641 #. Set ``used_idx`` to the ``idx`` value of used ring
645 #. If the value of ``used_idx`` does not match the ``idx`` value of
646 used ring (means the inflight field of ``DescStateSplit`` entries in
647 last batch may be incorrect),
649 a. Subtract the value of ``used_idx`` from the ``idx`` value of
650 used ring to get last batch size of ``DescStateSplit`` entries
652 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
653 list which starts from ``last_batch_head``
655 #. Set ``used_idx`` to the ``idx`` value of used ring
657 #. Resubmit inflight ``DescStateSplit`` entries in order of their
660 For packed virtqueue, queue region can be implemented as:
664 typedef struct DescStatePacked {
665 /* Indicate whether this descriptor is inflight or not.
666 * Only available for head-descriptor. */
672 /* Link to the next free entry */
675 /* Link to the last entry of descriptor list.
676 * Only available for head-descriptor. */
679 /* The length of descriptor list.
680 * Only available for head-descriptor. */
683 /* Used to preserve the order of fetching available descriptors.
684 * Only available for head-descriptor. */
690 /* The descriptor flags */
693 /* The buffer length */
696 /* The buffer address */
700 typedef struct QueueRegionPacked {
701 /* The feature flags of this region. Now it's initialized to 0. */
704 /* The version of this region. It's 1 currently.
705 * Zero value indicates an uninitialized buffer */
708 /* The size of DescStatePacked array. It's equal to the virtqueue
709 * size. Slave could get it from queue size field of VhostUserInflight. */
712 /* The head of free DescStatePacked entry list */
715 /* The old head of free DescStatePacked entry list */
716 uint16_t old_free_head;
718 /* The used index of descriptor ring */
721 /* The old used index of descriptor ring */
722 uint16_t old_used_idx;
724 /* Device ring wrap counter */
725 uint8_t used_wrap_counter;
727 /* The old device ring wrap counter */
728 uint8_t old_used_wrap_counter;
733 /* Used to track the state of each descriptor fetched from descriptor ring */
734 DescStatePacked desc[];
737 To track inflight I/O, the queue region should be processed as follows:
739 When receiving available buffers from the driver:
741 #. Get the next available descriptor entry from descriptor ring, ``d``
743 #. If ``d`` is head descriptor,
745 a. Set ``desc[old_free_head].num`` to 0
747 #. Set ``desc[old_free_head].counter`` to the value of global counter
749 #. Increase global counter by 1
751 #. Set ``desc[old_free_head].inflight`` to 1
753 #. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to
756 #. Increase ``desc[old_free_head].num`` by 1
758 #. Set ``desc[free_head].addr``, ``desc[free_head].len``,
759 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``,
760 ``d.len``, ``d.flags``, ``d.id``
762 #. Set ``free_head`` to ``desc[free_head].next``
764 #. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head``
766 When supplying used buffers to the driver:
768 1. Get corresponding used head-descriptor entry from descriptor ring,
771 2. Get corresponding ``DescStatePacked`` entry, ``e``
773 3. Set ``desc[e.last].next`` to ``free_head``
775 4. Set ``free_head`` to the index of ``e``
777 #. Steps 1,2,3,4 may be performed repeatedly if batching is possible
779 #. Increase ``used_idx`` by the size of the batch and update
780 ``used_wrap_counter`` if needed
782 #. Update ``d.flags``
784 #. Set the ``inflight`` field of each head ``DescStatePacked`` entry
787 #. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
788 to ``free_head``, ``used_idx``, ``used_wrap_counter``
792 #. If ``used_idx`` does not match ``old_used_idx`` (means the
793 ``inflight`` field of ``DescStatePacked`` entries in last batch may
796 a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
798 #. Use ``old_used_wrap_counter`` to calculate the available flags
800 #. If ``d.flags`` is not equal to the calculated flags value (means
801 slave has submitted the buffer to guest driver before crash, so
802 it has to commit the in-progres update), set ``old_free_head``,
803 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``,
804 ``used_idx``, ``used_wrap_counter``
806 #. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to
807 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
808 (roll back any in-progress update)
810 #. Set the ``inflight`` field of each ``DescStatePacked`` entry in
813 #. Resubmit inflight ``DescStatePacked`` entries in order of their
816 In-band notifications
817 ---------------------
819 In some limited situations (e.g. for simulation) it is desirable to
820 have the kick, call and error (if used) signals done via in-band
821 messages instead of asynchronous eventfd notifications. This can be
822 done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
825 Note that due to the fact that too many messages on the sockets can
826 cause the sending application(s) to block, it is not advised to use
827 this feature unless absolutely necessary. It is also considered an
828 error to negotiate this feature without also negotiating
829 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``,
830 the former is necessary for getting a message channel from the slave
831 to the master, while the latter needs to be used with the in-band
832 notification messages to block until they are processed, both to avoid
833 blocking later and for proper processing (at least in the simulation
834 use case.) As it has no other way of signalling this error, the slave
835 should close the connection as a response to a
836 ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
837 notifications feature flag without the other two.
844 #define VHOST_USER_PROTOCOL_F_MQ 0
845 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
846 #define VHOST_USER_PROTOCOL_F_RARP 2
847 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3
848 #define VHOST_USER_PROTOCOL_F_MTU 4
849 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
850 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6
851 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
852 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8
853 #define VHOST_USER_PROTOCOL_F_CONFIG 9
854 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10
855 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
856 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
857 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13
858 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
859 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15
860 #define VHOST_USER_PROTOCOL_F_STATUS 16
865 ``VHOST_USER_GET_FEATURES``
867 :equivalent ioctl: ``VHOST_GET_FEATURES``
869 :slave payload: ``u64``
871 Get from the underlying vhost implementation the features bitmask.
872 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals slave support
873 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
874 ``VHOST_USER_SET_PROTOCOL_FEATURES``.
876 ``VHOST_USER_SET_FEATURES``
878 :equivalent ioctl: ``VHOST_SET_FEATURES``
879 :master payload: ``u64``
881 Enable features in the underlying vhost implementation using a
882 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals
883 slave support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
884 ``VHOST_USER_SET_PROTOCOL_FEATURES``.
886 ``VHOST_USER_GET_PROTOCOL_FEATURES``
888 :equivalent ioctl: ``VHOST_GET_FEATURES``
890 :slave payload: ``u64``
892 Get the protocol feature bitmask from the underlying vhost
893 implementation. Only legal if feature bit
894 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
895 ``VHOST_USER_GET_FEATURES``.
898 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must
899 support this message even before ``VHOST_USER_SET_FEATURES`` was
902 ``VHOST_USER_SET_PROTOCOL_FEATURES``
904 :equivalent ioctl: ``VHOST_SET_FEATURES``
905 :master payload: ``u64``
907 Enable protocol features in the underlying vhost implementation.
909 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
910 ``VHOST_USER_GET_FEATURES``.
913 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
914 this message even before ``VHOST_USER_SET_FEATURES`` was called.
916 ``VHOST_USER_SET_OWNER``
918 :equivalent ioctl: ``VHOST_SET_OWNER``
921 Issued when a new connection is established. It sets the current
922 *master* as an owner of the session. This can be used on the *slave*
923 as a "session start" flag.
925 ``VHOST_USER_RESET_OWNER``
929 .. admonition:: Deprecated
931 This is no longer used. Used to be sent to request disabling all
932 rings, but some clients interpreted it to also discard connection
933 state (this interpretation would lead to bugs). It is recommended
934 that clients either ignore this message, or use it to disable all
937 ``VHOST_USER_SET_MEM_TABLE``
939 :equivalent ioctl: ``VHOST_SET_MEM_TABLE``
940 :master payload: memory regions description
941 :slave payload: (postcopy only) memory regions description
943 Sets the memory map regions on the slave so it can translate the
944 vring addresses. In the ancillary data there is an array of file
945 descriptors for each memory mapped region. The size and ordering of
946 the fds matches the number and ordering of memory regions.
948 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received,
949 ``SET_MEM_TABLE`` replies with the bases of the memory mapped
950 regions to the master. The slave must have mmap'd the regions but
951 not yet accessed them and should not yet generate a userfault
955 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then
956 reply back to the list of mappings with an empty
957 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon
958 reception of this message may the guest start accessing the memory
959 and generating faults.
961 ``VHOST_USER_SET_LOG_BASE``
963 :equivalent ioctl: ``VHOST_SET_LOG_BASE``
967 Sets logging shared memory space.
969 When slave has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
970 the log memory fd is provided in the ancillary data of
971 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
972 memory area provided in the message.
974 ``VHOST_USER_SET_LOG_FD``
976 :equivalent ioctl: ``VHOST_SET_LOG_FD``
979 Sets the logging file descriptor, which is passed as ancillary data.
981 ``VHOST_USER_SET_VRING_NUM``
983 :equivalent ioctl: ``VHOST_SET_VRING_NUM``
984 :master payload: vring state description
986 Set the size of the queue.
988 ``VHOST_USER_SET_VRING_ADDR``
990 :equivalent ioctl: ``VHOST_SET_VRING_ADDR``
991 :master payload: vring address description
994 Sets the addresses of the different aspects of the vring.
996 ``VHOST_USER_SET_VRING_BASE``
998 :equivalent ioctl: ``VHOST_SET_VRING_BASE``
999 :master payload: vring state description
1001 Sets the base offset in the available vring.
1003 ``VHOST_USER_GET_VRING_BASE``
1005 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
1006 :master payload: vring state description
1007 :slave payload: vring state description
1009 Get the available vring base offset.
1011 ``VHOST_USER_SET_VRING_KICK``
1013 :equivalent ioctl: ``VHOST_SET_VRING_KICK``
1014 :master payload: ``u64``
1016 Set the event file descriptor for adding buffers to the vring. It is
1017 passed in the ancillary data.
1019 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1020 invalid FD flag. This flag is set when there is no file descriptor
1021 in the ancillary data. This signals that polling should be used
1022 instead of waiting for the kick. Note that if the protocol feature
1023 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated
1024 this message isn't necessary as the ring is also started on the
1025 ``VHOST_USER_VRING_KICK`` message, it may however still be used to
1026 set an event file descriptor (which will be preferred over the
1027 message) or to enable polling.
1029 ``VHOST_USER_SET_VRING_CALL``
1031 :equivalent ioctl: ``VHOST_SET_VRING_CALL``
1032 :master payload: ``u64``
1034 Set the event file descriptor to signal when buffers are used. It is
1035 passed in the ancillary data.
1037 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1038 invalid FD flag. This flag is set when there is no file descriptor
1039 in the ancillary data. This signals that polling will be used
1040 instead of waiting for the call. Note that if the protocol features
1041 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
1042 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message
1043 isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be
1044 used, it may however still be used to set an event file descriptor
1045 or to enable polling.
1047 ``VHOST_USER_SET_VRING_ERR``
1049 :equivalent ioctl: ``VHOST_SET_VRING_ERR``
1050 :master payload: ``u64``
1052 Set the event file descriptor to signal when error occurs. It is
1053 passed in the ancillary data.
1055 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1056 invalid FD flag. This flag is set when there is no file descriptor
1057 in the ancillary data. Note that if the protocol features
1058 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
1059 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message
1060 isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be
1061 used, it may however still be used to set an event file descriptor
1062 (which will be preferred over the message).
1064 ``VHOST_USER_GET_QUEUE_NUM``
1066 :equivalent ioctl: N/A
1067 :master payload: N/A
1070 Query how many queues the backend supports.
1072 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ``
1073 is set in queried protocol features by
1074 ``VHOST_USER_GET_PROTOCOL_FEATURES``.
1076 ``VHOST_USER_SET_VRING_ENABLE``
1078 :equivalent ioctl: N/A
1079 :master payload: vring state description
1081 Signal slave to enable or disable corresponding vring.
1083 This request should be sent only when
1084 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated.
1086 ``VHOST_USER_SEND_RARP``
1088 :equivalent ioctl: N/A
1089 :master payload: ``u64``
1091 Ask vhost user backend to broadcast a fake RARP to notify the migration
1092 is terminated for guest that does not support GUEST_ANNOUNCE.
1094 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is
1095 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
1096 ``VHOST_USER_PROTOCOL_F_RARP`` is present in
1097 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the
1098 payload contain the mac address of the guest to allow the vhost user
1099 backend to construct and broadcast the fake RARP.
1101 ``VHOST_USER_NET_SET_MTU``
1103 :equivalent ioctl: N/A
1104 :master payload: ``u64``
1106 Set host MTU value exposed to the guest.
1108 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature
1109 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES``
1110 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
1111 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in
1112 ``VHOST_USER_GET_PROTOCOL_FEATURES``.
1114 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must
1115 respond with zero in case the specified MTU is valid, or non-zero
1118 ``VHOST_USER_SET_SLAVE_REQ_FD``
1120 :equivalent ioctl: N/A
1121 :master payload: N/A
1123 Set the socket file descriptor for slave initiated requests. It is passed
1124 in the ancillary data.
1126 This request should be sent only when
1127 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol
1128 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in
1129 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If
1130 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must
1131 respond with zero for success, non-zero otherwise.
1133 ``VHOST_USER_IOTLB_MSG``
1135 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
1136 :master payload: ``struct vhost_iotlb_msg``
1137 :slave payload: ``u64``
1139 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
1141 Master sends such requests to update and invalidate entries in the
1142 device IOTLB. The slave has to acknowledge the request with sending
1143 zero as ``u64`` payload for success, non-zero otherwise.
1145 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM``
1146 feature has been successfully negotiated.
1148 ``VHOST_USER_SET_VRING_ENDIAN``
1150 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN``
1151 :master payload: vring state description
1153 Set the endianness of a VQ for legacy devices. Little-endian is
1154 indicated with state.num set to 0 and big-endian is indicated with
1155 state.num set to 1. Other values are invalid.
1157 This request should be sent only when
1158 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated.
1159 Backends that negotiated this feature should handle both
1160 endiannesses and expect this message once (per VQ) during device
1161 configuration (ie. before the master starts the VQ).
1163 ``VHOST_USER_GET_CONFIG``
1165 :equivalent ioctl: N/A
1166 :master payload: virtio device config space
1167 :slave payload: virtio device config space
1169 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
1170 submitted by the vhost-user master to fetch the contents of the
1171 virtio device configuration space, vhost-user slave's payload size
1172 MUST match master's request, vhost-user slave uses zero length of
1173 payload to indicate an error to vhost-user master. The vhost-user
1174 master may cache the contents to avoid repeated
1175 ``VHOST_USER_GET_CONFIG`` calls.
1177 ``VHOST_USER_SET_CONFIG``
1179 :equivalent ioctl: N/A
1180 :master payload: virtio device config space
1183 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
1184 submitted by the vhost-user master when the Guest changes the virtio
1185 device configuration space and also can be used for live migration
1186 on the destination host. The vhost-user slave must check the flags
1187 field, and slaves MUST NOT accept SET_CONFIG for read-only
1188 configuration space fields unless the live migration bit is set.
1190 ``VHOST_USER_CREATE_CRYPTO_SESSION``
1192 :equivalent ioctl: N/A
1193 :master payload: crypto session description
1194 :slave payload: crypto session description
1196 Create a session for crypto operation. The server side must return
1197 the session id, 0 or positive for success, negative for failure.
1198 This request should be sent only when
1199 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
1200 successfully negotiated. It's a required feature for crypto
1203 ``VHOST_USER_CLOSE_CRYPTO_SESSION``
1205 :equivalent ioctl: N/A
1206 :master payload: ``u64``
1208 Close a session for crypto operation which was previously
1209 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``.
1211 This request should be sent only when
1212 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
1213 successfully negotiated. It's a required feature for crypto
1216 ``VHOST_USER_POSTCOPY_ADVISE``
1218 :master payload: N/A
1219 :slave payload: userfault fd
1221 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the master
1222 advises slave that a migration with postcopy enabled is underway,
1223 the slave must open a userfaultfd for later use. Note that at this
1224 stage the migration is still in precopy mode.
1226 ``VHOST_USER_POSTCOPY_LISTEN``
1228 :master payload: N/A
1230 Master advises slave that a transition to postcopy mode has
1231 happened. The slave must ensure that shared memory is registered
1232 with userfaultfd to cause faulting of non-present pages.
1234 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``,
1235 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported.
1237 ``VHOST_USER_POSTCOPY_END``
1239 :slave payload: ``u64``
1241 Master advises that postcopy migration has now completed. The slave
1242 must disable the userfaultfd. The response is an acknowledgement
1245 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message
1246 is sent at the end of the migration, after
1247 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent.
1249 The value returned is an error indication; 0 is success.
1251 ``VHOST_USER_GET_INFLIGHT_FD``
1253 :equivalent ioctl: N/A
1254 :master payload: inflight description
1256 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
1257 been successfully negotiated, this message is submitted by master to
1258 get a shared buffer from slave. The shared buffer will be used to
1259 track inflight I/O by slave. QEMU should retrieve a new one when vm
1262 ``VHOST_USER_SET_INFLIGHT_FD``
1264 :equivalent ioctl: N/A
1265 :master payload: inflight description
1267 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
1268 been successfully negotiated, this message is submitted by master to
1269 send the shared inflight buffer back to slave so that slave could
1270 get inflight I/O after a crash or restart.
1272 ``VHOST_USER_GPU_SET_SOCKET``
1274 :equivalent ioctl: N/A
1275 :master payload: N/A
1277 Sets the GPU protocol socket file descriptor, which is passed as
1278 ancillary data. The GPU protocol is used to inform the master of
1279 rendering state and updates. See vhost-user-gpu.rst for details.
1281 ``VHOST_USER_RESET_DEVICE``
1283 :equivalent ioctl: N/A
1284 :master payload: N/A
1287 Ask the vhost user backend to disable all rings and reset all
1288 internal device state to the initial state, ready to be
1289 reinitialized. The backend retains ownership of the device
1290 throughout the reset operation.
1292 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
1293 feature is set by the backend.
1295 ``VHOST_USER_VRING_KICK``
1297 :equivalent ioctl: N/A
1298 :slave payload: vring state description
1299 :master payload: N/A
1301 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1302 feature has been successfully negotiated, this message may be
1303 submitted by the master to indicate that a buffer was added to
1304 the vring instead of signalling it using the vring's kick file
1305 descriptor or having the slave rely on polling.
1307 The state.num field is currently reserved and must be set to 0.
1309 ``VHOST_USER_GET_MAX_MEM_SLOTS``
1311 :equivalent ioctl: N/A
1314 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1315 feature has been successfully negotiated, this message is submitted
1316 by master to the slave. The slave should return the message with a
1317 u64 payload containing the maximum number of memory slots for
1318 QEMU to expose to the guest. The value returned by the backend
1319 will be capped at the maximum number of ram slots which can be
1320 supported by the target platform.
1322 ``VHOST_USER_ADD_MEM_REG``
1324 :equivalent ioctl: N/A
1325 :slave payload: single memory region description
1327 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1328 feature has been successfully negotiated, this message is submitted
1329 by the master to the slave. The message payload contains a memory
1330 region descriptor struct, describing a region of guest memory which
1331 the slave device must map in. When the
1332 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
1333 been successfully negotiated, along with the
1334 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
1335 update the memory tables of the slave device.
1337 ``VHOST_USER_REM_MEM_REG``
1339 :equivalent ioctl: N/A
1340 :slave payload: single memory region description
1342 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1343 feature has been successfully negotiated, this message is submitted
1344 by the master to the slave. The message payload contains a memory
1345 region descriptor struct, describing a region of guest memory which
1346 the slave device must unmap. When the
1347 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
1348 been successfully negotiated, along with the
1349 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
1350 update the memory tables of the slave device.
1352 ``VHOST_USER_SET_STATUS``
1354 :equivalent ioctl: VHOST_VDPA_SET_STATUS
1356 :master payload: ``u64``
1358 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1359 successfully negotiated, this message is submitted by the master to
1360 notify the backend with updated device status as defined in the Virtio
1363 ``VHOST_USER_GET_STATUS``
1365 :equivalent ioctl: VHOST_VDPA_GET_STATUS
1366 :slave payload: ``u64``
1367 :master payload: N/A
1369 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1370 successfully negotiated, this message is submitted by the master to
1371 query the backend for its device status as defined in the Virtio
1378 ``VHOST_USER_SLAVE_IOTLB_MSG``
1380 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
1381 :slave payload: ``struct vhost_iotlb_msg``
1382 :master payload: N/A
1384 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
1385 Slave sends such requests to notify of an IOTLB miss, or an IOTLB
1386 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is
1387 negotiated, and slave set the ``VHOST_USER_NEED_REPLY`` flag, master
1388 must respond with zero when operation is successfully completed, or
1389 non-zero otherwise. This request should be send only when
1390 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully
1393 ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``
1395 :equivalent ioctl: N/A
1397 :master payload: N/A
1399 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
1400 slave sends such messages to notify that the virtio device's
1401 configuration space has changed, for those host devices which can
1402 support such feature, host driver can send ``VHOST_USER_GET_CONFIG``
1403 message to slave to get the latest content. If
1404 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and slave set the
1405 ``VHOST_USER_NEED_REPLY`` flag, master must respond with zero when
1406 operation is successfully completed, or non-zero otherwise.
1408 ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``
1410 :equivalent ioctl: N/A
1411 :slave payload: vring area description
1412 :master payload: N/A
1414 Sets host notifier for a specified queue. The queue index is
1415 contained in the ``u64`` field of the vring area description. The
1416 host notifier is described by the file descriptor (typically it's a
1417 VFIO device fd) which is passed as ancillary data and the size
1418 (which is mmap size and should be the same as host page size) and
1419 offset (which is mmap offset) carried in the vring area
1420 description. QEMU can mmap the file descriptor based on the size and
1421 offset to get a memory range. Registering a host notifier means
1422 mapping this memory range to the VM as the specified queue's notify
1423 MMIO region. Slave sends this request to tell QEMU to de-register
1424 the existing notifier if any and register the new notifier if the
1425 request is sent with a file descriptor.
1427 This request should be sent only when
1428 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
1429 successfully negotiated.
1431 ``VHOST_USER_SLAVE_VRING_CALL``
1433 :equivalent ioctl: N/A
1434 :slave payload: vring state description
1435 :master payload: N/A
1437 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1438 feature has been successfully negotiated, this message may be
1439 submitted by the slave to indicate that a buffer was used from
1440 the vring instead of signalling this using the vring's call file
1441 descriptor or having the master relying on polling.
1443 The state.num field is currently reserved and must be set to 0.
1445 ``VHOST_USER_SLAVE_VRING_ERR``
1447 :equivalent ioctl: N/A
1448 :slave payload: vring state description
1449 :master payload: N/A
1451 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1452 feature has been successfully negotiated, this message may be
1453 submitted by the slave to indicate that an error occurred on the
1454 specific vring, instead of signalling the error file descriptor
1455 set by the master via ``VHOST_USER_SET_VRING_ERR``.
1457 The state.num field is currently reserved and must be set to 0.
1461 VHOST_USER_PROTOCOL_F_REPLY_ACK
1462 -------------------------------
1464 The original vhost-user specification only demands replies for certain
1465 commands. This differs from the vhost protocol implementation where
1466 commands are sent over an ``ioctl()`` call and block until the client
1469 With this protocol extension negotiated, the sender (QEMU) can set the
1470 ``need_reply`` [Bit 3] flag to any command. This indicates that the
1471 client MUST respond with a Payload ``VhostUserMsg`` indicating success
1472 or failure. The payload should be set to zero on success or non-zero
1473 on failure, unless the message already has an explicit reply body.
1475 The response payload gives QEMU a deterministic indication of the result
1476 of the command. Today, QEMU is expected to terminate the main vhost-user
1477 loop upon receiving such errors. In future, qemu could be taught to be more
1478 resilient for selective requests.
1480 For the message types that already solicit a reply from the client,
1481 the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
1482 being set brings no behavioural change. (See the Communication_
1483 section for details.)
1485 .. _backend_conventions:
1487 Backend program conventions
1488 ===========================
1490 vhost-user backends can provide various devices & services and may
1491 need to be configured manually depending on the use case. However, it
1492 is a good idea to follow the conventions listed here when
1493 possible. Users, QEMU or libvirt, can then rely on some common
1494 behaviour to avoid heterogeneous configuration and management of the
1495 backend programs and facilitate interoperability.
1497 Each backend installed on a host system should come with at least one
1498 JSON file that conforms to the vhost-user.json schema. Each file
1499 informs the management applications about the backend type, and binary
1500 location. In addition, it defines rules for management apps for
1501 picking the highest priority backend when multiple match the search
1502 criteria (see ``@VhostUserBackend`` documentation in the schema file).
1504 If the backend is not capable of enabling a requested feature on the
1505 host (such as 3D acceleration with virgl), or the initialization
1506 failed, the backend should fail to start early and exit with a status
1507 != 0. It may also print a message to stderr for further details.
1509 The backend program must not daemonize itself, but it may be
1510 daemonized by the management layer. It may also have a restricted
1511 access to the system.
1513 File descriptors 0, 1 and 2 will exist, and have regular
1514 stdin/stdout/stderr usage (they may have been redirected to /dev/null
1515 by the management layer, or to a log handler).
1517 The backend program must end (as quickly and cleanly as possible) when
1518 the SIGTERM signal is received. Eventually, it may receive SIGKILL by
1519 the management layer after a few seconds.
1521 The following command line options have an expected behaviour. They
1522 are mandatory, unless explicitly said differently:
1526 This option specify the location of the vhost-user Unix domain socket.
1527 It is incompatible with --fd.
1531 When this argument is given, the backend program is started with the
1532 vhost-user socket as file descriptor FDNUM. It is incompatible with
1535 --print-capabilities
1537 Output to stdout the backend capabilities in JSON format, and then
1538 exit successfully. Other options and arguments should be ignored, and
1539 the backend program should not perform its normal function. The
1540 capabilities can be reported dynamically depending on the host
1543 The JSON output is described in the ``vhost-user.json`` schema, by
1544 ```@VHostUserBackendCapabilities``. Example:
1559 Command line options:
1563 Specify the linux input device.
1569 Do no request exclusive access to the input device.
1576 Command line options:
1580 Specify the GPU DRM render node.
1586 Enable virgl rendering support.
1593 Command line options:
1597 Specify block device or file path.