]> git.proxmox.com Git - mirror_edk2.git/blob - OvmfPkg/VirtioNetDxe/TechNotes.txt
OvmfPkg/VirtioNetDxe: map VRINGs using VirtioRingMap()
[mirror_edk2.git] / OvmfPkg / VirtioNetDxe / TechNotes.txt
1 ## @file
2 #
3 # Technical notes for the virtio-net driver.
4 #
5 # Copyright (C) 2013, Red Hat, Inc.
6 #
7 # This program and the accompanying materials are licensed and made available
8 # under the terms and conditions of the BSD License which accompanies this
9 # distribution. The full text of the license may be found at
10 # http://opensource.org/licenses/bsd-license.php
11 #
12 # THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS, WITHOUT
13 # WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
14 #
15 ##
16
17 Disclaimer
18 ----------
19
20 All statements concerning standards and specifications are informative and not
21 normative. They are made in good faith. Corrections are most welcome on the
22 edk2-devel mailing list.
23
24 The following documents have been perused while writing the driver and this
25 document:
26 - Unified Extensible Firmware Interface Specification, Version 2.3.1, Errata C;
27 June 27, 2012
28 - Driver Writer's Guide for UEFI 2.3.1, 03/08/2012, Version 1.01;
29 - Virtio PCI Card Specification, v0.9.5 DRAFT, 2012 May 7.
30
31
32 Summary
33 -------
34
35 The VirtioNetDxe UEFI_DRIVER implements the Simple Network Protocol for
36 virtio-net devices. Higher level protocols are automatically installed on top
37 of it by the DXE Core / the ConnectController() boot service, enabling for
38 virtio-net devices eg. DHCP configuration, TCP transfers with edk2 StdLib
39 applications, and PXE booting in OVMF.
40
41
42 UEFI driver structure
43 ---------------------
44
45 A driver instance, belonging to a given virtio-net device, can be in one of
46 four states at any time. The states stack up as follows below. The state
47 transitions are labeled with the primary function (and its important callees
48 faithfully indented) that implement the transition.
49
50 | ^
51 | |
52 [DriverBinding.c] | | [DriverBinding.c]
53 VirtioNetDriverBindingStart | | VirtioNetDriverBindingStop
54 VirtioNetSnpPopulate | | VirtioNetSnpEvacuate
55 VirtioNetGetFeatures | |
56 v |
57 +-------------------------+
58 | EfiSimpleNetworkStopped |
59 +-------------------------+
60 | ^
61 [SnpStart.c] | | [SnpStop.c]
62 VirtioNetStart | | VirtioNetStop
63 | |
64 v |
65 +-------------------------+
66 | EfiSimpleNetworkStarted |
67 +-------------------------+
68 | ^
69 [SnpInitialize.c] | | [SnpShutdown.c]
70 VirtioNetInitialize | | VirtioNetShutdown
71 VirtioNetInitRing {Rx, Tx} | | VirtioNetShutdownRx [SnpSharedHelpers.c]
72 VirtioRingInit | | VirtioNetShutdownTx [SnpSharedHelpers.c]
73 VirtioRingMap | | VirtioNetUninitRing [SnpSharedHelpers.c]
74 VirtioNetInitTx | | {Tx, Rx}
75 VirtioNetInitRx | | VirtIo->UnmapSharedBuffer
76 | | VirtioRingUninit
77 v |
78 +-----------------------------+
79 | EfiSimpleNetworkInitialized |
80 +-----------------------------+
81
82 The state at the top means "nonexistent" and is hence unnamed on the diagram --
83 a driver instance actually doesn't exist at that point. The transition
84 functions out of and into that state implement the Driver Binding Protocol.
85
86 The lower three states characterize an existent driver instance and are all
87 states defined by the Simple Network Protocol. The transition functions between
88 them are member functions of the Simple Network Protocol.
89
90 Each transition function validates its expected source state and its
91 parameters. For example, VirtioNetDriverBindingStop will refuse to disconnect
92 from the controller unless it's in EfiSimpleNetworkStopped.
93
94
95 Driver instance states (Simple Network Protocol)
96 ------------------------------------------------
97
98 In the EfiSimpleNetworkStopped state, the virtio-net device is (has been)
99 re-set. No resources are allocated for networking / traffic purposes. The MAC
100 address and other device attributes have been retrieved from the device (this
101 is necessary for completing the VirtioNetDriverBindingStart transition).
102
103 The EfiSimpleNetworkStarted is completely identical to the
104 EfiSimpleNetworkStopped state for virtio-net, in the functional and
105 resource-usage sense. This state is mandated / provided by the Simple Network
106 Protocol for flexibility that the virtio-net driver doesn't exploit.
107
108 In particular, the EfiSimpleNetworkStarted state is the target of the Shutdown
109 SNP member function, and must therefore correspond to a hardware configuration
110 where "[it] is safe for another driver to initialize". (Clearly another UEFI
111 driver could not do that due to the exclusivity of the driver binding that
112 VirtioNetDriverBindingStart() installs, but a later OS driver might qualify.)
113
114 The EfiSimpleNetworkInitialized state is the live state of the virtio NIC / the
115 driver instance. Virtio and other resources required for network traffic have
116 been allocated, and the following SNP member functions are available (in
117 addition to VirtioNetShutdown which leaves the state):
118
119 - VirtioNetReceive [SnpReceive.c]: poll the virtio NIC for an Rx packet that
120 may have arrived asynchronously;
121
122 - VirtioNetTransmit [SnpTransmit.c]: queue a Tx packet for asynchronous
123 transmission (meant to be used together with VirtioNetGetStatus);
124
125 - VirtioNetGetStatus [SnpGetStatus.c]: query link status and status of pending
126 Tx packets;
127
128 - VirtioNetMcastIpToMac [SnpMcastIpToMac.c]: transform a multicast IPv4/IPv6
129 address into a multicast MAC address;
130
131 - VirtioNetReceiveFilters [SnpReceiveFilters.c]: emulate unicast / multicast /
132 broadcast filter configuration (not their actual effect -- a more liberal
133 filter setting than requested is allowed by the UEFI specification).
134
135 The following SNP member functions are not supported [SnpUnsupported.c]:
136
137 - VirtioNetReset: reinitialize the virtio NIC without shutting it down (a loop
138 from/to EfiSimpleNetworkInitialized);
139
140 - VirtioNetStationAddress: assign a new MAC address to the virtio NIC,
141
142 - VirtioNetStatistics: collect statistics,
143
144 - VirtioNetNvData: access non-volatile data on the virtio NIC.
145
146 Missing support for these functions is allowed by the UEFI specification and
147 doesn't seem to trip up higher level protocols.
148
149
150 Events and task priority levels
151 -------------------------------
152
153 The UEFI specification defines a sophisticated mechanism for asynchronous
154 events / callbacks (see "6.1 Event, Timer, and Task Priority Services" for
155 details). Such callbacks work like software interrupts, and some notion of
156 locking / masking is important to implement critical sections (atomic or
157 exclusive access to data or a device). This notion is defined as Task Priority
158 Levels.
159
160 The virtio-net driver for OVMF must concern itself with events for two reasons:
161
162 - The Simple Network Protocol provides its clients with a (non-optional) WAIT
163 type event called WaitForPacket: it allows them to check or wait for Rx
164 packets by polling or blocking on this event. (This functionality overlaps
165 with the Receive member function.) The event is available to clients starting
166 with EfiSimpleNetworkStopped (inclusive).
167
168 The virtio-net driver is informed about such client polling or blockage by
169 receiving an asynchronous callback (a software interrupt). In the callback
170 function the driver must interrogate the driver instance state, and if it is
171 EfiSimpleNetworkInitialized, access the Rx queue and see if any packets are
172 available for consumption. If so, it must signal the WaitForPacket WAIT type
173 event, waking the client.
174
175 For simplicity and safety, all parts of the virtio-net driver that access any
176 bit of the driver instance (data or device) run at the TPL_CALLBACK level.
177 This is the highest level allowed for an SNP implementation, and all code
178 protected in this manner satisfies even stricter non-blocking requirements
179 than what's documented for TPL_CALLBACK.
180
181 The task priority level for the WaitForPacket callback too is set by the
182 driver, the choice is TPL_CALLBACK again. This in effect serializes the
183 WaitForPacket callback (VirtioNetIsPacketAvailable [Events.c]) with "normal"
184 parts of the driver.
185
186 - According to the Driver Writer's Guide, a network driver should install a
187 callback function for the global EXIT_BOOT_SERVICES event (a special NOTIFY
188 type event). When the ExitBootServices() boot service has cleaned up internal
189 firmware state and is about to pass control to the OS, any network driver has
190 to stop any in-flight DMA transfers, lest it corrupts OS memory. For this
191 reason EXIT_BOOT_SERVICES is emitted and the network driver must abort
192 in-flight DMA transfers.
193
194 This callback (VirtioNetExitBoot) is synchronized with the rest of the driver
195 code just the same as explained for WaitForPacket. In
196 EfiSimpleNetworkInitialized state it resets the virtio NIC, halting all data
197 transfer. After the callback returns, no further driver code is expected to
198 be scheduled.
199
200
201 Virtio internals -- Rx
202 ----------------------
203
204 Requests (Rx and Tx alike) are always submitted by the guest and processed by
205 the host. For Tx, processing means transmission. For Rx, processing means
206 filling in the request with an incoming packet. Submitted requests exist on the
207 "Available Ring", and answered (processed) requests show up on the "Used Ring".
208
209 Packet data includes the media (Ethernet) header: destination MAC, source MAC,
210 and Ethertype (14 bytes total).
211
212 The following structures implement packet reception. Most of them are defined
213 in the Virtio specification, the only driver-specific trait here is the static
214 pre-configuration of the two-part descriptor chains, in VirtioNetInitRx. The
215 diagram is simplified.
216
217 Available Index Available Index
218 last processed incremented
219 by the host by the guest
220 v -------> v
221 Available +-------+-------+-------+-------+-------+
222 Ring |DescIdx|DescIdx|DescIdx|DescIdx|DescIdx|
223 +-------+-------+-------+-------+-------+
224 =D6 =D2
225
226 D2 D3 D4 D5 D6 D7
227 Descr. +----------+----------++----------+----------++----------+----------+
228 Table |Adr:Len:Nx|Adr:Len:Nx||Adr:Len:Nx|Adr:Len:Nx||Adr:Len:Nx|Adr:Len:Nx|
229 +----------+----------++----------+----------++----------+----------+
230 =A2 =D3 =A3 =A4 =D5 =A5 =A6 =D7 =A7
231
232
233 A2 A3 A4 A5 A6 A7
234 Receive +---------------+---------------+---------------+
235 Destination |vnet hdr:packet|vnet hdr:packet|vnet hdr:packet|
236 Area +---------------+---------------+---------------+
237
238 Used Index Used Index incremented
239 last processed by the guest by the host
240 v -------> v
241 Used +-----------+-----------+-----------+-----------+-----------+
242 Ring |DescIdx:Len|DescIdx:Len|DescIdx:Len|DescIdx:Len|DescIdx:Len|
243 +-----------+-----------+-----------+-----------+-----------+
244 =D4
245
246 In VirtioNetInitRx, the guest allocates the fixed size Receive Destination
247 Area, which accommodates all packets delivered asynchronously by the host. To
248 each packet, a slice of this area is dedicated; each slice is further
249 subdivided into virtio-net request header and network packet data. The
250 (guest-physical) addresses of these sub-slices are denoted with A2, A3, A4 and
251 so on. Importantly, an even-subscript "A" always belongs to a virtio-net
252 request header, while an odd-subscript "A" always belongs to a packet
253 sub-slice.
254
255 Furthermore, the guest lays out a static pattern in the Descriptor Table. For
256 each packet that can be in-flight or already arrived from the host,
257 VirtioNetInitRx sets up a separate, two-part descriptor chain. For packet N,
258 the Nth descriptor chain is set up as follows:
259
260 - the first (=head) descriptor, with even index, points to the fixed-size
261 sub-slice receiving the virtio-net request header,
262
263 - the second descriptor (with odd index) points to the fixed (1514 byte) size
264 sub-slice receiving the packet data,
265
266 - a link from the first (head) descriptor in the chain is established to the
267 second (tail) descriptor in the chain.
268
269 Finally, the guest populates the Available Ring with the indices of the head
270 descriptors. All descriptor indices on both the Available Ring and the Used
271 Ring are even.
272
273 Packet reception occurs as follows:
274
275 - The host consumes a descriptor index off the Available Ring. This index is
276 even (=2*N), and fingers the head descriptor of the chain belonging to packet
277 N.
278
279 - The host reads the descriptors D(2*N) and -- following the Next link there
280 --- D(2*N+1), and stores the virtio-net request header at A(2*N), and the
281 packet data at A(2*N+1).
282
283 - The host places the index of the head descriptor, 2*N, onto the Used Ring,
284 and sets the Len field in the same Used Ring Element to the total number of
285 bytes transferred for the entire descriptor chain. This enables the guest to
286 identify the length of Rx packets.
287
288 - VirtioNetReceive polls the Used Ring. If a new Used Ring Element shows up, it
289 copies the data out to the caller, and recycles the index of the head
290 descriptor (ie. 2*N) to the Available Ring.
291
292 - Because the host can process (answer) Rx requests in any order theoretically,
293 the order of head descriptor indices on each of the Available Ring and the
294 Used Ring is virtually random. (Except right after the initial population in
295 VirtioNetInitRx, when the Available Ring is full and increasing, and the Used
296 Ring is empty.)
297
298 - If the Available Ring is empty, the host is forced to drop packets. If the
299 Used Ring is empty, VirtioNetReceive returns EFI_NOT_READY (no packet
300 available).
301
302
303 Virtio internals -- Tx
304 ----------------------
305
306 The transmission structure erected by VirtioNetInitTx is similar, it differs
307 in the following:
308
309 - There is no Receive Destination Area.
310
311 - Each head descriptor, D(2*N), points to a read-only virtio-net request header
312 that is shared by all of the head descriptors. This virtio-net request header
313 is never modified by the host.
314
315 - Each tail descriptor is re-pointed to the caller-supplied packet buffer
316 whenever VirtioNetTransmit places the corresponding head descriptor on the
317 Available Ring. The caller is responsible to hang on to the unmodified buffer
318 until it is reported transmitted by VirtioNetGetStatus.
319
320 Steps of packet transmission:
321
322 - Client code calls VirtioNetTransmit. VirtioNetTransmit tracks free descriptor
323 chains by keeping the indices of their head descriptors in a stack that is
324 private to the driver instance. All elements of the stack are even.
325
326 - If the stack is empty (that is, each descriptor chain, in isolation, is
327 either pending transmission, or has been processed by the host but not
328 yet recycled by a VirtioNetGetStatus call), then VirtioNetTransmit returns
329 EFI_NOT_READY.
330
331 - Otherwise the index of a free chain's head descriptor is popped from the
332 stack. The linked tail descriptor is re-pointed as discussed above. The head
333 descriptor's index is pushed on the Available Ring.
334
335 - The host moves the head descriptor index from the Available Ring to the Used
336 Ring when it transmits the packet.
337
338 - Client code calls VirtioNetGetStatus. In case the Used Ring is empty, the
339 function reports no Tx completion. Otherwise, a head descriptor's index is
340 consumed from the Used Ring and recycled to the private stack. The client
341 code's original packet buffer address is fetched from the tail descriptor
342 (where it has been stored at VirtioNetTransmit time) and returned to the
343 caller.
344
345 - The Len field of the Used Ring Element is not checked. The host is assumed to
346 have transmitted the entire packet -- VirtioNetTransmit had forced it below
347 1514 bytes (inclusive). The Virtio specification suggests this packet size is
348 always accepted (and a lower MTU could be encountered on any later hop as
349 well). Additionally, there's no good way to report a short transmit via
350 VirtioNetGetStatus; EFI_DEVICE_ERROR seems too serious from the specification
351 and higher level protocols could interpret it as a fatal condition.
352
353 - The host can theoretically reorder head descriptor indices when moving them
354 from the Available Ring to the Used Ring (out of order transmission). Because
355 of this (and the choice of a stack over a list for free descriptor chain
356 tracking) the order of head descriptor indices on either Ring is
357 unpredictable.