]>
Commit | Line | Data |
---|---|---|
5fc0e002 NN |
1 | Vhost-user Protocol |
2 | =================== | |
3 | ||
4 | Copyright (c) 2014 Virtual Open Systems Sarl. | |
5 | ||
6 | This work is licensed under the terms of the GNU GPL, version 2 or later. | |
7 | See the COPYING file in the top-level directory. | |
8 | =================== | |
9 | ||
10 | This protocol is aiming to complement the ioctl interface used to control the | |
11 | vhost implementation in the Linux kernel. It implements the control plane needed | |
12 | to establish virtqueue sharing with a user space process on the same host. It | |
13 | uses communication over a Unix domain socket to share file descriptors in the | |
14 | ancillary data of the message. | |
15 | ||
16 | The protocol defines 2 sides of the communication, master and slave. Master is | |
17 | the application that shares its virtqueues, in our case QEMU. Slave is the | |
18 | consumer of the virtqueues. | |
19 | ||
20 | In the current implementation QEMU is the Master, and the Slave is intended to | |
21 | be a software Ethernet switch running in user space, such as Snabbswitch. | |
22 | ||
23 | Master and slave can be either a client (i.e. connecting) or server (listening) | |
24 | in the socket communication. | |
25 | ||
26 | Message Specification | |
27 | --------------------- | |
28 | ||
29 | Note that all numbers are in the machine native byte order. A vhost-user message | |
30 | consists of 3 header fields and a payload: | |
31 | ||
32 | ------------------------------------ | |
33 | | request | flags | size | payload | | |
34 | ------------------------------------ | |
35 | ||
36 | * Request: 32-bit type of the request | |
37 | * Flags: 32-bit bit field: | |
38 | - Lower 2 bits are the version (currently 0x01) | |
39 | - Bit 2 is the reply flag - needs to be sent on each reply from the slave | |
ca525ce5 PS |
40 | - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for |
41 | details. | |
5fc0e002 NN |
42 | * Size - 32-bit size of the payload |
43 | ||
44 | ||
45 | Depending on the request type, payload can be: | |
46 | ||
47 | * A single 64-bit integer | |
48 | ------- | |
49 | | u64 | | |
50 | ------- | |
51 | ||
52 | u64: a 64-bit unsigned integer | |
53 | ||
54 | * A vring state description | |
55 | --------------- | |
7722b1a7 SH |
56 | | index | num | |
57 | --------------- | |
5fc0e002 NN |
58 | |
59 | Index: a 32-bit index | |
60 | Num: a 32-bit number | |
61 | ||
62 | * A vring address description | |
63 | -------------------------------------------------------------- | |
64 | | index | flags | size | descriptor | used | available | log | | |
65 | -------------------------------------------------------------- | |
66 | ||
67 | Index: a 32-bit vring index | |
68 | Flags: a 32-bit vring flags | |
c3d331d2 SH |
69 | Descriptor: a 64-bit ring address of the vring descriptor table |
70 | Used: a 64-bit ring address of the vring used ring | |
71 | Available: a 64-bit ring address of the vring available ring | |
5fc0e002 NN |
72 | Log: a 64-bit guest address for logging |
73 | ||
c3d331d2 SH |
74 | Note that a ring address is an IOVA if VIRTIO_F_IOMMU_PLATFORM has been |
75 | negotiated. Otherwise it is a user address. | |
76 | ||
5fc0e002 NN |
77 | * Memory regions description |
78 | --------------------------------------------------- | |
79 | | num regions | padding | region0 | ... | region7 | | |
80 | --------------------------------------------------- | |
81 | ||
82 | Num regions: a 32-bit number of regions | |
83 | Padding: 32-bit | |
84 | ||
85 | A region is: | |
3fd74b84 DM |
86 | ----------------------------------------------------- |
87 | | guest address | size | user address | mmap offset | | |
88 | ----------------------------------------------------- | |
5fc0e002 NN |
89 | |
90 | Guest address: a 64-bit guest address of the region | |
91 | Size: a 64-bit size | |
92 | User address: a 64-bit user address | |
a628fc8d | 93 | mmap offset: 64-bit offset where region starts in the mapped memory |
5fc0e002 | 94 | |
a586e65b MT |
95 | * Log description |
96 | --------------------------- | |
97 | | log size | log offset | | |
98 | --------------------------- | |
99 | log size: size of area used for logging | |
100 | log offset: offset from start of supplied file descriptor | |
101 | where logging starts (i.e. where guest address 0 would be logged) | |
102 | ||
6dcdd06e MC |
103 | * An IOTLB message |
104 | --------------------------------------------------------- | |
105 | | iova | size | user address | permissions flags | type | | |
106 | --------------------------------------------------------- | |
107 | ||
108 | IOVA: a 64-bit I/O virtual address programmed by the guest | |
109 | Size: a 64-bit size | |
110 | User address: a 64-bit user address | |
9277d81f | 111 | Permissions: an 8-bit value: |
6dcdd06e MC |
112 | - 0: No access |
113 | - 1: Read access | |
114 | - 2: Write access | |
115 | - 3: Read/Write access | |
9277d81f | 116 | Type: an 8-bit IOTLB message type: |
6dcdd06e MC |
117 | - 1: IOTLB miss |
118 | - 2: IOTLB update | |
119 | - 3: IOTLB invalidate | |
120 | - 4: IOTLB access fail | |
121 | ||
4c3e257b CL |
122 | * Virtio device config space |
123 | ----------------------------------- | |
124 | | offset | size | flags | payload | | |
125 | ----------------------------------- | |
126 | ||
127 | Offset: a 32-bit offset of virtio device's configuration space | |
128 | Size: a 32-bit configuration space access size in bytes | |
129 | Flags: a 32-bit value: | |
130 | - 0: Vhost master messages used for writeable fields | |
131 | - 1: Vhost master messages used for live migration | |
132 | Payload: Size bytes array holding the contents of the virtio | |
133 | device's configuration space | |
134 | ||
44866521 TB |
135 | * Vring area description |
136 | ----------------------- | |
137 | | u64 | size | offset | | |
138 | ----------------------- | |
139 | ||
140 | u64: a 64-bit integer contains vring index and flags | |
141 | Size: a 64-bit size of this area | |
142 | Offset: a 64-bit offset of this area from the start of the | |
143 | supplied file descriptor | |
144 | ||
5fc0e002 NN |
145 | In QEMU the vhost-user message is implemented with the following struct: |
146 | ||
147 | typedef struct VhostUserMsg { | |
148 | VhostUserRequest request; | |
149 | uint32_t flags; | |
150 | uint32_t size; | |
151 | union { | |
152 | uint64_t u64; | |
153 | struct vhost_vring_state state; | |
154 | struct vhost_vring_addr addr; | |
155 | VhostUserMemory memory; | |
2b8819c6 | 156 | VhostUserLog log; |
6dcdd06e | 157 | struct vhost_iotlb_msg iotlb; |
4c3e257b | 158 | VhostUserConfig config; |
44866521 | 159 | VhostUserVringArea area; |
5fc0e002 NN |
160 | }; |
161 | } QEMU_PACKED VhostUserMsg; | |
162 | ||
163 | Communication | |
164 | ------------- | |
165 | ||
166 | The protocol for vhost-user is based on the existing implementation of vhost | |
167 | for the Linux Kernel. Most messages that can be sent via the Unix domain socket | |
168 | implementing vhost-user have an equivalent ioctl to the kernel implementation. | |
169 | ||
170 | The communication consists of master sending message requests and slave sending | |
171 | message replies. Most of the requests don't require replies. Here is a list of | |
172 | the ones that do: | |
173 | ||
5449c230 WW |
174 | * VHOST_USER_GET_FEATURES |
175 | * VHOST_USER_GET_PROTOCOL_FEATURES | |
176 | * VHOST_USER_GET_VRING_BASE | |
177 | * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) | |
5fc0e002 | 178 | |
ca525ce5 PS |
179 | [ Also see the section on REPLY_ACK protocol extension. ] |
180 | ||
5fc0e002 NN |
181 | There are several messages that the master sends with file descriptors passed |
182 | in the ancillary data: | |
183 | ||
5449c230 WW |
184 | * VHOST_USER_SET_MEM_TABLE |
185 | * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) | |
186 | * VHOST_USER_SET_LOG_FD | |
187 | * VHOST_USER_SET_VRING_KICK | |
188 | * VHOST_USER_SET_VRING_CALL | |
189 | * VHOST_USER_SET_VRING_ERR | |
4bbeeba0 | 190 | * VHOST_USER_SET_SLAVE_REQ_FD |
5fc0e002 NN |
191 | |
192 | If Master is unable to send the full message or receives a wrong reply it will | |
193 | close the connection. An optional reconnection mechanism can be implemented. | |
194 | ||
dcb10c00 MT |
195 | Any protocol extensions are gated by protocol feature bits, |
196 | which allows full backwards compatibility on both master | |
197 | and slave. | |
198 | As older slaves don't support negotiating protocol features, | |
199 | a feature bit was dedicated for this purpose: | |
200 | #define VHOST_USER_F_PROTOCOL_FEATURES 30 | |
201 | ||
a586e65b MT |
202 | Starting and stopping rings |
203 | ---------------------- | |
c61f09ed MT |
204 | Client must only process each ring when it is started. |
205 | ||
206 | Client must only pass data between the ring and the | |
207 | backend, when the ring is enabled. | |
208 | ||
209 | If ring is started but disabled, client must process the | |
210 | ring without talking to the backend. | |
211 | ||
212 | For example, for a networking device, in the disabled state | |
213 | client must not supply any new RX packets, but must process | |
214 | and discard any TX packets. | |
7ebcfe56 MT |
215 | |
216 | If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized | |
217 | in an enabled state. | |
a586e65b | 218 | |
7ebcfe56 | 219 | If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized |
c61f09ed | 220 | in a disabled state. Client must not pass data to/from the backend until ring is enabled by |
7ebcfe56 MT |
221 | VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled by |
222 | VHOST_USER_SET_VRING_ENABLE with parameter 0. | |
223 | ||
224 | Each ring is initialized in a stopped state, client must not process it until | |
225 | ring is started, or after it has been stopped. | |
a586e65b | 226 | |
7ebcfe56 MT |
227 | Client must start ring upon receiving a kick (that is, detecting that file |
228 | descriptor is readable) on the descriptor specified by | |
229 | VHOST_USER_SET_VRING_KICK, and stop ring upon receiving | |
230 | VHOST_USER_GET_VRING_BASE. | |
a586e65b | 231 | |
c61f09ed | 232 | While processing the rings (whether they are enabled or not), client must |
7ebcfe56 | 233 | support changing some configuration aspects on the fly. |
a586e65b | 234 | |
b931bfbf CO |
235 | Multiple queue support |
236 | ---------------------- | |
237 | ||
238 | Multiple queue is treated as a protocol extension, hence the slave has to | |
239 | implement protocol features first. The multiple queues feature is supported | |
c62b91e5 | 240 | only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set. |
b931bfbf | 241 | |
7de22778 MC |
242 | The max number of queue pairs the slave supports can be queried with message |
243 | VHOST_USER_GET_QUEUE_NUM. Master should stop when the number of | |
b931bfbf CO |
244 | requested queues is bigger than that. |
245 | ||
246 | As all queues share one connection, the master uses a unique index for each | |
7263a0ad CO |
247 | queue in the sent message to identify a specified queue. One queue pair |
248 | is enabled initially. More queues are enabled dynamically, by sending | |
249 | message VHOST_USER_SET_VRING_ENABLE. | |
b931bfbf | 250 | |
c62b91e5 MAL |
251 | Migration |
252 | --------- | |
253 | ||
254 | During live migration, the master may need to track the modifications | |
255 | the slave makes to the memory mapped regions. The client should mark | |
256 | the dirty pages in a log. Once it complies to this logging, it may | |
257 | declare the VHOST_F_LOG_ALL vhost feature. | |
258 | ||
a586e65b MT |
259 | To start/stop logging of data/used ring writes, server may send messages |
260 | VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with | |
261 | VHOST_VRING_F_LOG in ring's flags set to 1/0, respectively. | |
262 | ||
c62b91e5 MAL |
263 | All the modifications to memory pointed by vring "descriptor" should |
264 | be marked. Modifications to "used" vring should be marked if | |
a586e65b | 265 | VHOST_VRING_F_LOG is part of ring's flags. |
c62b91e5 MAL |
266 | |
267 | Dirty pages are of size: | |
268 | #define VHOST_LOG_PAGE 0x1000 | |
269 | ||
270 | The log memory fd is provided in the ancillary data of | |
271 | VHOST_USER_SET_LOG_BASE message when the slave has | |
272 | VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature. | |
273 | ||
a586e65b MT |
274 | The size of the log is supplied as part of VhostUserMsg |
275 | which should be large enough to cover all known guest | |
276 | addresses. Log starts at the supplied offset in the | |
277 | supplied file descriptor. | |
278 | The log covers from address 0 to the maximum of guest | |
c62b91e5 MAL |
279 | regions. In pseudo-code, to mark page at "addr" as dirty: |
280 | ||
281 | page = addr / VHOST_LOG_PAGE | |
282 | log[page / 8] |= 1 << page % 8 | |
283 | ||
a586e65b MT |
284 | Where addr is the guest physical address. |
285 | ||
c62b91e5 MAL |
286 | Use atomic operations, as the log may be concurrently manipulated. |
287 | ||
a586e65b MT |
288 | Note that when logging modifications to the used ring (when VHOST_VRING_F_LOG |
289 | is set for this ring), log_guest_addr should be used to calculate the log | |
290 | offset: the write to first byte of the used ring is logged at this offset from | |
291 | log start. Also note that this value might be outside the legal guest physical | |
292 | address range (i.e. does not have to be covered by the VhostUserMemory table), | |
293 | but the bit offset of the last byte of the ring must fall within | |
294 | the size supplied by VhostUserLog. | |
295 | ||
c62b91e5 MAL |
296 | VHOST_USER_SET_LOG_FD is an optional message with an eventfd in |
297 | ancillary data, it may be used to inform the master that the log has | |
298 | been modified. | |
299 | ||
a586e65b MT |
300 | Once the source has finished migration, rings will be stopped by |
301 | the source. No further update must be done before rings are | |
302 | restarted. | |
c62b91e5 | 303 | |
9ccbfe14 DDAG |
304 | In postcopy migration the slave is started before all the memory has been |
305 | received from the source host, and care must be taken to avoid accessing pages | |
306 | that have yet to be received. The slave opens a 'userfault'-fd and registers | |
307 | the memory with it; this fd is then passed back over to the master. | |
308 | The master services requests on the userfaultfd for pages that are accessed | |
309 | and when the page is available it performs WAKE ioctl's on the userfaultfd | |
310 | to wake the stalled slave. The client indicates support for this via the | |
311 | VHOST_USER_PROTOCOL_F_PAGEFAULT feature. | |
312 | ||
c3d331d2 SH |
313 | Memory access |
314 | ------------- | |
315 | ||
316 | The master sends a list of vhost memory regions to the slave using the | |
317 | VHOST_USER_SET_MEM_TABLE message. Each region has two base addresses: a guest | |
318 | address and a user address. | |
319 | ||
320 | Messages contain guest addresses and/or user addresses to reference locations | |
321 | within the shared memory. The mapping of these addresses works as follows. | |
322 | ||
323 | User addresses map to the vhost memory region containing that user address. | |
324 | ||
325 | When the VIRTIO_F_IOMMU_PLATFORM feature has not been negotiated: | |
326 | ||
327 | * Guest addresses map to the vhost memory region containing that guest | |
328 | address. | |
329 | ||
330 | When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated: | |
331 | ||
332 | * Guest addresses are also called I/O virtual addresses (IOVAs). They are | |
333 | translated to user addresses via the IOTLB. | |
334 | ||
335 | * The vhost memory region guest address is not used. | |
336 | ||
6dcdd06e MC |
337 | IOMMU support |
338 | ------------- | |
339 | ||
340 | When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master | |
341 | sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG | |
342 | requests to the slave with a struct vhost_iotlb_msg as payload. For update | |
343 | events, the iotlb payload has to be filled with the update message type (2), | |
344 | the I/O virtual address, the size, the user virtual address, and the | |
345 | permissions flags. Addresses and size must be within vhost memory regions set | |
346 | via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb | |
347 | payload has to be filled with the invalidation message type (3), the I/O virtual | |
348 | address and the size. On success, the slave is expected to reply with a zero | |
349 | payload, non-zero otherwise. | |
350 | ||
351 | The slave relies on the slave communcation channel (see "Slave communication" | |
352 | section below) to send IOTLB miss and access failure events, by sending | |
353 | VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg | |
354 | as payload. For miss events, the iotlb payload has to be filled with the miss | |
355 | message type (1), the I/O virtual address and the permissions flags. For access | |
356 | failure event, the iotlb payload has to be filled with the access failure | |
357 | message type (4), the I/O virtual address and the permissions flags. | |
358 | For synchronization purpose, the slave may rely on the reply-ack feature, | |
359 | so the master may send a reply when operation is completed if the reply-ack | |
360 | feature is negotiated and slaves requests a reply. For miss events, completed | |
361 | operation means either master sent an update message containing the IOTLB entry | |
362 | containing requested address and permission, or master sent nothing if the IOTLB | |
363 | miss message is invalid (invalid IOVA or permission). | |
364 | ||
365 | The master isn't expected to take the initiative to send IOTLB update messages, | |
366 | as the slave sends IOTLB miss messages for the guest virtual memory areas it | |
367 | needs to access. | |
368 | ||
4bbeeba0 MAL |
369 | Slave communication |
370 | ------------------- | |
371 | ||
372 | An optional communication channel is provided if the slave declares | |
373 | VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make | |
374 | requests to the master. | |
375 | ||
376 | The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data. | |
377 | ||
378 | A slave may then send VHOST_USER_SLAVE_* messages to the master | |
379 | using this fd communication channel. | |
380 | ||
5f57fbea TB |
381 | If VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD protocol feature is negotiated, |
382 | slave can send file descriptors (at most 8 descriptors in each message) | |
383 | to master via ancillary data using this fd communication channel. | |
384 | ||
c62b91e5 MAL |
385 | Protocol features |
386 | ----------------- | |
387 | ||
388 | #define VHOST_USER_PROTOCOL_F_MQ 0 | |
389 | #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 | |
3e866365 | 390 | #define VHOST_USER_PROTOCOL_F_RARP 2 |
ca525ce5 | 391 | #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 |
c5f048d8 | 392 | #define VHOST_USER_PROTOCOL_F_MTU 4 |
4bbeeba0 | 393 | #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 |
5df04f17 | 394 | #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 |
efbfeb81 | 395 | #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 |
9ccbfe14 | 396 | #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 |
1c3e5a26 | 397 | #define VHOST_USER_PROTOCOL_F_CONFIG 9 |
5f57fbea | 398 | #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 |
44866521 | 399 | #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 |
c62b91e5 | 400 | |
4bbeeba0 MAL |
401 | Master message types |
402 | -------------------- | |
5fc0e002 NN |
403 | |
404 | * VHOST_USER_GET_FEATURES | |
405 | ||
46e797c4 | 406 | Id: 1 |
5fc0e002 NN |
407 | Equivalent ioctl: VHOST_GET_FEATURES |
408 | Master payload: N/A | |
409 | Slave payload: u64 | |
410 | ||
411 | Get from the underlying vhost implementation the features bitmask. | |
dcb10c00 MT |
412 | Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for |
413 | VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. | |
5fc0e002 NN |
414 | |
415 | * VHOST_USER_SET_FEATURES | |
416 | ||
46e797c4 | 417 | Id: 2 |
5fc0e002 NN |
418 | Ioctl: VHOST_SET_FEATURES |
419 | Master payload: u64 | |
420 | ||
421 | Enable features in the underlying vhost implementation using a bitmask. | |
dcb10c00 MT |
422 | Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for |
423 | VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. | |
424 | ||
425 | * VHOST_USER_GET_PROTOCOL_FEATURES | |
426 | ||
427 | Id: 15 | |
428 | Equivalent ioctl: VHOST_GET_FEATURES | |
429 | Master payload: N/A | |
430 | Slave payload: u64 | |
431 | ||
432 | Get the protocol feature bitmask from the underlying vhost implementation. | |
433 | Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in | |
434 | VHOST_USER_GET_FEATURES. | |
435 | Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support | |
436 | this message even before VHOST_USER_SET_FEATURES was called. | |
437 | ||
438 | * VHOST_USER_SET_PROTOCOL_FEATURES | |
439 | ||
440 | Id: 16 | |
441 | Ioctl: VHOST_SET_FEATURES | |
442 | Master payload: u64 | |
443 | ||
444 | Enable protocol features in the underlying vhost implementation. | |
445 | Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in | |
446 | VHOST_USER_GET_FEATURES. | |
447 | Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support | |
448 | this message even before VHOST_USER_SET_FEATURES was called. | |
5fc0e002 NN |
449 | |
450 | * VHOST_USER_SET_OWNER | |
451 | ||
46e797c4 | 452 | Id: 3 |
5fc0e002 NN |
453 | Equivalent ioctl: VHOST_SET_OWNER |
454 | Master payload: N/A | |
455 | ||
456 | Issued when a new connection is established. It sets the current Master | |
457 | as an owner of the session. This can be used on the Slave as a | |
458 | "session start" flag. | |
459 | ||
60915dc4 | 460 | * VHOST_USER_RESET_OWNER |
5fc0e002 | 461 | |
46e797c4 | 462 | Id: 4 |
5fc0e002 NN |
463 | Master payload: N/A |
464 | ||
c61f09ed | 465 | This is no longer used. Used to be sent to request disabling |
a586e65b MT |
466 | all rings, but some clients interpreted it to also discard |
467 | connection state (this interpretation would lead to bugs). | |
468 | It is recommended that clients either ignore this message, | |
c61f09ed | 469 | or use it to disable all rings. |
5fc0e002 NN |
470 | |
471 | * VHOST_USER_SET_MEM_TABLE | |
472 | ||
46e797c4 | 473 | Id: 5 |
5fc0e002 NN |
474 | Equivalent ioctl: VHOST_SET_MEM_TABLE |
475 | Master payload: memory regions description | |
9bb38019 | 476 | Slave payload: (postcopy only) memory regions description |
5fc0e002 NN |
477 | |
478 | Sets the memory map regions on the slave so it can translate the vring | |
479 | addresses. In the ancillary data there is an array of file descriptors | |
480 | for each memory mapped region. The size and ordering of the fds matches | |
481 | the number and ordering of memory regions. | |
482 | ||
9bb38019 DDAG |
483 | When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE replies with |
484 | the bases of the memory mapped regions to the master. The slave must | |
485 | have mmap'd the regions but not yet accessed them and should not yet generate | |
486 | a userfault event. Note NEED_REPLY_MASK is not set in this case. | |
487 | QEMU will then reply back to the list of mappings with an empty | |
488 | VHOST_USER_SET_MEM_TABLE as an acknowledgment; only upon reception of this | |
489 | message may the guest start accessing the memory and generating faults. | |
490 | ||
5fc0e002 NN |
491 | * VHOST_USER_SET_LOG_BASE |
492 | ||
46e797c4 | 493 | Id: 6 |
5fc0e002 NN |
494 | Equivalent ioctl: VHOST_SET_LOG_BASE |
495 | Master payload: u64 | |
c62b91e5 | 496 | Slave payload: N/A |
5fc0e002 | 497 | |
2b8819c6 VK |
498 | Sets logging shared memory space. |
499 | When slave has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol | |
500 | feature, the log memory fd is provided in the ancillary data of | |
501 | VHOST_USER_SET_LOG_BASE message, the size and offset of shared | |
502 | memory area provided in the message. | |
503 | ||
5fc0e002 NN |
504 | |
505 | * VHOST_USER_SET_LOG_FD | |
506 | ||
46e797c4 | 507 | Id: 7 |
5fc0e002 NN |
508 | Equivalent ioctl: VHOST_SET_LOG_FD |
509 | Master payload: N/A | |
510 | ||
511 | Sets the logging file descriptor, which is passed as ancillary data. | |
512 | ||
513 | * VHOST_USER_SET_VRING_NUM | |
514 | ||
46e797c4 | 515 | Id: 8 |
5fc0e002 NN |
516 | Equivalent ioctl: VHOST_SET_VRING_NUM |
517 | Master payload: vring state description | |
518 | ||
09230cb8 | 519 | Set the size of the queue. |
5fc0e002 NN |
520 | |
521 | * VHOST_USER_SET_VRING_ADDR | |
522 | ||
46e797c4 | 523 | Id: 9 |
5fc0e002 NN |
524 | Equivalent ioctl: VHOST_SET_VRING_ADDR |
525 | Master payload: vring address description | |
526 | Slave payload: N/A | |
527 | ||
528 | Sets the addresses of the different aspects of the vring. | |
529 | ||
530 | * VHOST_USER_SET_VRING_BASE | |
531 | ||
46e797c4 | 532 | Id: 10 |
5fc0e002 NN |
533 | Equivalent ioctl: VHOST_SET_VRING_BASE |
534 | Master payload: vring state description | |
535 | ||
536 | Sets the base offset in the available vring. | |
537 | ||
538 | * VHOST_USER_GET_VRING_BASE | |
539 | ||
46e797c4 | 540 | Id: 11 |
5fc0e002 NN |
541 | Equivalent ioctl: VHOST_USER_GET_VRING_BASE |
542 | Master payload: vring state description | |
543 | Slave payload: vring state description | |
544 | ||
545 | Get the available vring base offset. | |
546 | ||
547 | * VHOST_USER_SET_VRING_KICK | |
548 | ||
46e797c4 | 549 | Id: 12 |
5fc0e002 NN |
550 | Equivalent ioctl: VHOST_SET_VRING_KICK |
551 | Master payload: u64 | |
552 | ||
553 | Set the event file descriptor for adding buffers to the vring. It | |
554 | is passed in the ancillary data. | |
555 | Bits (0-7) of the payload contain the vring index. Bit 8 is the | |
556 | invalid FD flag. This flag is set when there is no file descriptor | |
557 | in the ancillary data. This signals that polling should be used | |
558 | instead of waiting for a kick. | |
559 | ||
560 | * VHOST_USER_SET_VRING_CALL | |
561 | ||
46e797c4 | 562 | Id: 13 |
5fc0e002 NN |
563 | Equivalent ioctl: VHOST_SET_VRING_CALL |
564 | Master payload: u64 | |
565 | ||
566 | Set the event file descriptor to signal when buffers are used. It | |
567 | is passed in the ancillary data. | |
568 | Bits (0-7) of the payload contain the vring index. Bit 8 is the | |
569 | invalid FD flag. This flag is set when there is no file descriptor | |
570 | in the ancillary data. This signals that polling will be used | |
571 | instead of waiting for the call. | |
572 | ||
573 | * VHOST_USER_SET_VRING_ERR | |
574 | ||
46e797c4 | 575 | Id: 14 |
5fc0e002 NN |
576 | Equivalent ioctl: VHOST_SET_VRING_ERR |
577 | Master payload: u64 | |
578 | ||
579 | Set the event file descriptor to signal when error occurs. It | |
580 | is passed in the ancillary data. | |
581 | Bits (0-7) of the payload contain the vring index. Bit 8 is the | |
582 | invalid FD flag. This flag is set when there is no file descriptor | |
583 | in the ancillary data. | |
e2051e9e YL |
584 | |
585 | * VHOST_USER_GET_QUEUE_NUM | |
586 | ||
587 | Id: 17 | |
588 | Equivalent ioctl: N/A | |
589 | Master payload: N/A | |
590 | Slave payload: u64 | |
591 | ||
592 | Query how many queues the backend supports. This request should be | |
c954f09e | 593 | sent only when VHOST_USER_PROTOCOL_F_MQ is set in queried protocol |
e2051e9e | 594 | features by VHOST_USER_GET_PROTOCOL_FEATURES. |
7263a0ad CO |
595 | |
596 | * VHOST_USER_SET_VRING_ENABLE | |
597 | ||
598 | Id: 18 | |
599 | Equivalent ioctl: N/A | |
600 | Master payload: vring state description | |
601 | ||
602 | Signal slave to enable or disable corresponding vring. | |
a586e65b MT |
603 | This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES |
604 | has been negotiated. | |
3e866365 TC |
605 | |
606 | * VHOST_USER_SEND_RARP | |
607 | ||
608 | Id: 19 | |
609 | Equivalent ioctl: N/A | |
610 | Master payload: u64 | |
611 | ||
612 | Ask vhost user backend to broadcast a fake RARP to notify the migration | |
613 | is terminated for guest that does not support GUEST_ANNOUNCE. | |
614 | Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in | |
615 | VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP | |
616 | is present in VHOST_USER_GET_PROTOCOL_FEATURES. | |
617 | The first 6 bytes of the payload contain the mac address of the guest to | |
618 | allow the vhost user backend to construct and broadcast the fake RARP. | |
ca525ce5 | 619 | |
c5f048d8 MC |
620 | * VHOST_USER_NET_SET_MTU |
621 | ||
622 | Id: 20 | |
623 | Equivalent ioctl: N/A | |
624 | Master payload: u64 | |
625 | ||
626 | Set host MTU value exposed to the guest. | |
627 | This request should be sent only when VIRTIO_NET_F_MTU feature has been | |
628 | successfully negotiated, VHOST_USER_F_PROTOCOL_FEATURES is present in | |
629 | VHOST_USER_GET_FEATURES and protocol feature bit | |
630 | VHOST_USER_PROTOCOL_F_NET_MTU is present in | |
631 | VHOST_USER_GET_PROTOCOL_FEATURES. | |
632 | If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond | |
633 | with zero in case the specified MTU is valid, or non-zero otherwise. | |
634 | ||
4bbeeba0 MAL |
635 | * VHOST_USER_SET_SLAVE_REQ_FD |
636 | ||
637 | Id: 21 | |
638 | Equivalent ioctl: N/A | |
639 | Master payload: N/A | |
640 | ||
641 | Set the socket file descriptor for slave initiated requests. It is passed | |
642 | in the ancillary data. | |
643 | This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES | |
644 | has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ | |
645 | bit is present in VHOST_USER_GET_PROTOCOL_FEATURES. | |
646 | If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond | |
647 | with zero for success, non-zero otherwise. | |
648 | ||
6dcdd06e MC |
649 | * VHOST_USER_IOTLB_MSG |
650 | ||
651 | Id: 22 | |
652 | Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) | |
653 | Master payload: struct vhost_iotlb_msg | |
654 | Slave payload: u64 | |
655 | ||
656 | Send IOTLB messages with struct vhost_iotlb_msg as payload. | |
657 | Master sends such requests to update and invalidate entries in the device | |
658 | IOTLB. The slave has to acknowledge the request with sending zero as u64 | |
659 | payload for success, non-zero otherwise. | |
660 | This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature | |
661 | has been successfully negotiated. | |
662 | ||
5df04f17 FF |
663 | * VHOST_USER_SET_VRING_ENDIAN |
664 | ||
665 | Id: 23 | |
666 | Equivalent ioctl: VHOST_SET_VRING_ENDIAN | |
667 | Master payload: vring state description | |
668 | ||
669 | Set the endianess of a VQ for legacy devices. Little-endian is indicated | |
670 | with state.num set to 0 and big-endian is indicated with state.num set | |
671 | to 1. Other values are invalid. | |
672 | This request should be sent only when VHOST_USER_PROTOCOL_F_CROSS_ENDIAN | |
673 | has been negotiated. | |
674 | Backends that negotiated this feature should handle both endianesses | |
675 | and expect this message once (per VQ) during device configuration | |
676 | (ie. before the master starts the VQ). | |
677 | ||
4c3e257b CL |
678 | * VHOST_USER_GET_CONFIG |
679 | ||
680 | Id: 24 | |
681 | Equivalent ioctl: N/A | |
682 | Master payload: virtio device config space | |
683 | Slave payload: virtio device config space | |
684 | ||
1c3e5a26 MC |
685 | When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, this message is |
686 | submitted by the vhost-user master to fetch the contents of the virtio | |
4c3e257b CL |
687 | device configuration space, vhost-user slave's payload size MUST match |
688 | master's request, vhost-user slave uses zero length of payload to | |
689 | indicate an error to vhost-user master. The vhost-user master may | |
690 | cache the contents to avoid repeated VHOST_USER_GET_CONFIG calls. | |
691 | ||
692 | * VHOST_USER_SET_CONFIG | |
693 | ||
694 | Id: 25 | |
695 | Equivalent ioctl: N/A | |
696 | Master payload: virtio device config space | |
697 | Slave payload: N/A | |
698 | ||
1c3e5a26 MC |
699 | When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, this message is |
700 | submitted by the vhost-user master when the Guest changes the virtio | |
4c3e257b CL |
701 | device configuration space and also can be used for live migration |
702 | on the destination host. The vhost-user slave must check the flags | |
703 | field, and slaves MUST NOT accept SET_CONFIG for read-only | |
704 | configuration space fields unless the live migration bit is set. | |
705 | ||
efbfeb81 GA |
706 | * VHOST_USER_CREATE_CRYPTO_SESSION |
707 | ||
708 | Id: 26 | |
709 | Equivalent ioctl: N/A | |
710 | Master payload: crypto session description | |
711 | Slave payload: crypto session description | |
712 | ||
713 | Create a session for crypto operation. The server side must return the | |
714 | session id, 0 or positive for success, negative for failure. | |
715 | This request should be sent only when VHOST_USER_PROTOCOL_F_CRYPTO_SESSION | |
716 | feature has been successfully negotiated. | |
717 | It's a required feature for crypto devices. | |
718 | ||
719 | * VHOST_USER_CLOSE_CRYPTO_SESSION | |
720 | ||
721 | Id: 27 | |
722 | Equivalent ioctl: N/A | |
723 | Master payload: u64 | |
724 | ||
725 | Close a session for crypto operation which was previously | |
726 | created by VHOST_USER_CREATE_CRYPTO_SESSION. | |
727 | This request should be sent only when VHOST_USER_PROTOCOL_F_CRYPTO_SESSION | |
728 | feature has been successfully negotiated. | |
729 | It's a required feature for crypto devices. | |
730 | ||
d3dff7a5 DDAG |
731 | * VHOST_USER_POSTCOPY_ADVISE |
732 | Id: 28 | |
733 | Master payload: N/A | |
734 | Slave payload: userfault fd | |
735 | ||
736 | When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, the | |
737 | master advises slave that a migration with postcopy enabled is underway, | |
738 | the slave must open a userfaultfd for later use. | |
739 | Note that at this stage the migration is still in precopy mode. | |
740 | ||
6864a7b5 DDAG |
741 | * VHOST_USER_POSTCOPY_LISTEN |
742 | Id: 29 | |
743 | Master payload: N/A | |
744 | ||
745 | Master advises slave that a transition to postcopy mode has happened. | |
746 | The slave must ensure that shared memory is registered with userfaultfd | |
747 | to cause faulting of non-present pages. | |
748 | ||
749 | This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and | |
750 | thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported. | |
751 | ||
c639187e DDAG |
752 | * VHOST_USER_POSTCOPY_END |
753 | Id: 30 | |
754 | Slave payload: u64 | |
755 | ||
756 | Master advises that postcopy migration has now completed. The | |
757 | slave must disable the userfaultfd. The response is an acknowledgement | |
758 | only. | |
759 | When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, this message | |
760 | is sent at the end of the migration, after VHOST_USER_POSTCOPY_LISTEN | |
761 | was previously sent. | |
762 | The value returned is an error indication; 0 is success. | |
763 | ||
6dcdd06e MC |
764 | Slave message types |
765 | ------------------- | |
766 | ||
767 | * VHOST_USER_SLAVE_IOTLB_MSG | |
768 | ||
769 | Id: 1 | |
770 | Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) | |
771 | Slave payload: struct vhost_iotlb_msg | |
772 | Master payload: N/A | |
773 | ||
774 | Send IOTLB messages with struct vhost_iotlb_msg as payload. | |
775 | Slave sends such requests to notify of an IOTLB miss, or an IOTLB | |
776 | access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, | |
777 | and slave set the VHOST_USER_NEED_REPLY flag, master must respond with | |
778 | zero when operation is successfully completed, or non-zero otherwise. | |
779 | This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature | |
780 | has been successfully negotiated. | |
781 | ||
4c3e257b CL |
782 | * VHOST_USER_SLAVE_CONFIG_CHANGE_MSG |
783 | ||
784 | Id: 2 | |
785 | Equivalent ioctl: N/A | |
786 | Slave payload: N/A | |
787 | Master payload: N/A | |
788 | ||
1c3e5a26 MC |
789 | When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, vhost-user slave sends |
790 | such messages to notify that the virtio device's configuration space has | |
791 | changed, for those host devices which can support such feature, host | |
792 | driver can send VHOST_USER_GET_CONFIG message to slave to get the latest | |
793 | content. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, and slave set | |
794 | the VHOST_USER_NEED_REPLY flag, master must respond with zero when | |
795 | operation is successfully completed, or non-zero otherwise. | |
4c3e257b | 796 | |
44866521 TB |
797 | * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG |
798 | ||
799 | Id: 3 | |
800 | Equivalent ioctl: N/A | |
801 | Slave payload: vring area description | |
802 | Master payload: N/A | |
803 | ||
804 | Sets host notifier for a specified queue. The queue index is contained | |
805 | in the u64 field of the vring area description. The host notifier is | |
806 | described by the file descriptor (typically it's a VFIO device fd) which | |
807 | is passed as ancillary data and the size (which is mmap size and should | |
808 | be the same as host page size) and offset (which is mmap offset) carried | |
809 | in the vring area description. QEMU can mmap the file descriptor based | |
810 | on the size and offset to get a memory range. Registering a host notifier | |
811 | means mapping this memory range to the VM as the specified queue's notify | |
812 | MMIO region. Slave sends this request to tell QEMU to de-register the | |
813 | existing notifier if any and register the new notifier if the request is | |
814 | sent with a file descriptor. | |
815 | This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | |
816 | protocol feature has been successfully negotiated. | |
817 | ||
ca525ce5 PS |
818 | VHOST_USER_PROTOCOL_F_REPLY_ACK: |
819 | ------------------------------- | |
820 | The original vhost-user specification only demands replies for certain | |
821 | commands. This differs from the vhost protocol implementation where commands | |
822 | are sent over an ioctl() call and block until the client has completed. | |
823 | ||
824 | With this protocol extension negotiated, the sender (QEMU) can set the | |
825 | "need_reply" [Bit 3] flag to any command. This indicates that | |
826 | the client MUST respond with a Payload VhostUserMsg indicating success or | |
827 | failure. The payload should be set to zero on success or non-zero on failure, | |
828 | unless the message already has an explicit reply body. | |
829 | ||
830 | The response payload gives QEMU a deterministic indication of the result | |
831 | of the command. Today, QEMU is expected to terminate the main vhost-user | |
832 | loop upon receiving such errors. In future, qemu could be taught to be more | |
833 | resilient for selective requests. | |
834 | ||
835 | For the message types that already solicit a reply from the client, the | |
836 | presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings | |
837 | no behavioural change. (See the 'Communication' section for details.) |