]> git.proxmox.com Git - mirror_qemu.git/blame - docs/specs/ivshmem-spec.txt
vl: Fix error location of positional arguments
[mirror_qemu.git] / docs / specs / ivshmem-spec.txt
CommitLineData
fdee2025
MA
1= Device Specification for Inter-VM shared memory device =
2
3The Inter-VM shared memory device (ivshmem) is designed to share a
4memory region between multiple QEMU processes running different guests
5and the host. In order for all guests to be able to pick up the
6shared memory area, it is modeled by QEMU as a PCI device exposing
7said memory to the guest as a PCI BAR.
8
9The device can use a shared memory object on the host directly, or it
10can obtain one from an ivshmem server.
11
12In the latter case, the device can additionally interrupt its peers, and
13get interrupted by its peers.
14
15
16== Configuring the ivshmem PCI device ==
17
18There are two basic configurations:
19
5a0e75f0
TH
20- Just shared memory:
21
22 -device ivshmem-plain,memdev=HMB,...
fdee2025 23
5400c02b
MA
24 This uses host memory backend HMB. It should have option "share"
25 set.
fdee2025 26
5a0e75f0
TH
27- Shared memory plus interrupts:
28
29 -device ivshmem-doorbell,chardev=CHR,vectors=N,...
fdee2025
MA
30
31 An ivshmem server must already be running on the host. The device
32 connects to the server's UNIX domain socket via character device
33 CHR.
34
35 Each peer gets assigned a unique ID by the server. IDs must be
36 between 0 and 65535.
37
5400c02b
MA
38 Interrupts are message-signaled (MSI-X). vectors=N configures the
39 number of vectors to use.
fdee2025
MA
40
41For more details on ivshmem device properties, see The QEMU Emulator
42User Documentation (qemu-doc.*).
43
44
45== The ivshmem PCI device's guest interface ==
46
5400c02b
MA
47The device has vendor ID 1af4, device ID 1110, revision 1. Before
48QEMU 2.6.0, it had revision 0.
fdee2025
MA
49
50=== PCI BARs ===
51
52The ivshmem PCI device has two or three BARs:
53
54- BAR0 holds device registers (256 Byte MMIO)
5400c02b 55- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
fdee2025
MA
56- BAR2 maps the shared memory object
57
58There are two ways to use this device:
59
60- If you only need the shared memory part, BAR2 suffices. This way,
61 you have access to the shared memory in the guest and can use it as
62 you see fit. Memnic, for example, uses ivshmem this way from guest
63 user space (see http://dpdk.org/browse/memnic).
64
65- If you additionally need the capability for peers to interrupt each
5400c02b
MA
66 other, you need BAR0 and BAR1. You will most likely want to write a
67 kernel driver to handle interrupts. Requires the device to be
68 configured for interrupts, obviously.
fdee2025 69
1309cf44
MA
70Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
71configured for interrupts. It becomes safely accessible only after
5400c02b
MA
72the ivshmem server provided the shared memory. These devices have PCI
73revision 0 rather than 1. Guest software should wait for the
74IVPosition register (described below) to become non-negative before
75accessing BAR2.
fdee2025 76
5400c02b
MA
77Revision 0 of the device is not capable to tell guest software whether
78it is configured for interrupts.
fdee2025
MA
79
80=== PCI device registers ===
81
82BAR 0 contains the following registers:
83
84 Offset Size Access On reset Function
85 0 4 read/write 0 Interrupt Mask
5400c02b
MA
86 bit 0: peer interrupt (rev 0)
87 reserved (rev 1)
fdee2025
MA
88 bit 1..31: reserved
89 4 4 read/write 0 Interrupt Status
5400c02b
MA
90 bit 0: peer interrupt (rev 0)
91 reserved (rev 1)
fdee2025 92 bit 1..31: reserved
1309cf44 93 8 4 read-only 0 or ID IVPosition
fdee2025
MA
94 12 4 write-only N/A Doorbell
95 bit 0..15: vector
96 bit 16..31: peer ID
97 16 240 none N/A reserved
98
99Software should only access the registers as specified in column
100"Access". Reserved bits should be ignored on read, and preserved on
101write.
102
5400c02b
MA
103In revision 0 of the device, Interrupt Status and Mask Register
104together control the legacy INTx interrupt when the device has no
105MSI-X capability: INTx is asserted when the bit-wise AND of Status and
106Mask is non-zero and the device has no MSI-X capability. Interrupt
107Status Register bit 0 becomes 1 when an interrupt request from a peer
108is received. Reading the register clears it.
fdee2025
MA
109
110IVPosition Register: if the device is not configured for interrupts,
1309cf44
MA
111this is zero. Else, it is the device's ID (between 0 and 65535).
112
113Before QEMU 2.6.0, the register may read -1 for a short while after
5400c02b 114reset. These devices have PCI revision 0 rather than 1.
fdee2025
MA
115
116There is no good way for software to find out whether the device is
117configured for interrupts. A positive IVPosition means interrupts,
1309cf44 118but zero could be either.
fdee2025
MA
119
120Doorbell Register: writing this register requests to interrupt a peer.
121The written value's high 16 bits are the ID of the peer to interrupt,
122and its low 16 bits select an interrupt vector.
123
124If the device is not configured for interrupts, the write is ignored.
125
126If the interrupt hasn't completed setup, the write is ignored. The
127device is not capable to tell guest software whether setup is
128complete. Interrupts can regress to this state on migration.
129
130If the peer with the requested ID isn't connected, or it has fewer
131interrupt vectors connected, the write is ignored. The device is not
132capable to tell guest software what peers are connected, or how many
133interrupt vectors are connected.
134
5400c02b
MA
135The peer's interrupt for this vector then becomes pending. There is
136no way for software to clear the pending bit, and a polling mode of
137operation is therefore impossible.
fdee2025 138
5400c02b
MA
139If the peer is a revision 0 device without MSI-X capability, its
140Interrupt Status register is set to 1. This asserts INTx unless
141masked by the Interrupt Mask register. The device is not capable to
142communicate the interrupt vector to guest software then.
fdee2025
MA
143
144With multiple MSI-X vectors, different vectors can be used to indicate
145different events have occurred. The semantics of interrupt vectors
146are left to the application.
147
148
149== Interrupt infrastructure ==
150
151When configured for interrupts, the peers share eventfd objects in
152addition to shared memory. The shared resources are managed by an
153ivshmem server.
154
155=== The ivshmem server ===
156
157The server listens on a UNIX domain socket.
158
159For each new client that connects to the server, the server
160- picks an ID,
161- creates eventfd file descriptors for the interrupt vectors,
162- sends the ID and the file descriptor for the shared memory to the
163 new client,
164- sends connect notifications for the new client to the other clients
165 (these contain file descriptors for sending interrupts),
166- sends connect notifications for the other clients to the new client,
167 and
168- sends interrupt setup messages to the new client (these contain file
169 descriptors for receiving interrupts).
170
62a830b6
MA
171The first client to connect to the server receives ID zero.
172
fdee2025
MA
173When a client disconnects from the server, the server sends disconnect
174notifications to the other clients.
175
176The next section describes the protocol in detail.
177
178If the server terminates without sending disconnect notifications for
179its connected clients, the clients can elect to continue. They can
180communicate with each other normally, but won't receive disconnect
181notification on disconnect, and no new clients can connect. There is
182no way for the clients to connect to a restarted server. The device
183is not capable to tell guest software whether the server is still up.
184
185Example server code is in contrib/ivshmem-server/. Not to be used in
186production. It assumes all clients use the same number of interrupt
187vectors.
188
189A standalone client is in contrib/ivshmem-client/. It can be useful
190for debugging.
191
192=== The ivshmem Client-Server Protocol ===
193
194An ivshmem device configured for interrupts connects to an ivshmem
195server. This section details the protocol between the two.
196
197The connection is one-way: the server sends messages to the client.
198Each message consists of a single 8 byte little-endian signed number,
199and may be accompanied by a file descriptor via SCM_RIGHTS. Both
200client and server close the connection on error.
201
71c26581
MA
202Note: QEMU currently doesn't close the connection right on error, but
203only when the character device is destroyed.
204
fdee2025
MA
205On connect, the server sends the following messages in order:
206
2071. The protocol version number, currently zero. The client should
208 close the connection on receipt of versions it can't handle.
209
2102. The client's ID. This is unique among all clients of this server.
211 IDs must be between 0 and 65535, because the Doorbell register
212 provides only 16 bits for them.
213
2143. The number -1, accompanied by the file descriptor for the shared
215 memory.
216
2174. Connect notifications for existing other clients, if any. This is
218 a peer ID (number between 0 and 65535 other than the client's ID),
219 repeated N times. Each repetition is accompanied by one file
220 descriptor. These are for interrupting the peer with that ID using
221 vector 0,..,N-1, in order. If the client is configured for fewer
222 vectors, it closes the extra file descriptors. If it is configured
223 for more, the extra vectors remain unconnected.
224
2255. Interrupt setup. This is the client's own ID, repeated N times.
226 Each repetition is accompanied by one file descriptor. These are
227 for receiving interrupts from peers using vector 0,..,N-1, in
228 order. If the client is configured for fewer vectors, it closes
229 the extra file descriptors. If it is configured for more, the
230 extra vectors remain unconnected.
231
232From then on, the server sends these kinds of messages:
233
2346. Connection / disconnection notification. This is a peer ID.
235
236 - If the number comes with a file descriptor, it's a connection
237 notification, exactly like in step 4.
238
239 - Else, it's a disconnection notification for the peer with that ID.
240
241Known bugs:
242
243* The protocol changed incompatibly in QEMU 2.5. Before, messages
244 were native endian long, and there was no version number.
245
246* The protocol is poorly designed.
247
248=== The ivshmem Client-Client Protocol ===
249
250An ivshmem device configured for interrupts receives eventfd file
251descriptors for interrupting peers and getting interrupted by peers
252from the server, as explained in the previous section.
253
254To interrupt a peer, the device writes the 8-byte integer 1 in native
255byte order to the respective file descriptor.
256
257To receive an interrupt, the device reads and discards as many 8-byte
258integers as it can.