]> git.proxmox.com Git - mirror_edk2.git/blame - OvmfPkg/VirtioNetDxe/TechNotes.txt
OvmfPkg/VirtioNetDxe: map VRINGs using VirtioRingMap()
[mirror_edk2.git] / OvmfPkg / VirtioNetDxe / TechNotes.txt
CommitLineData
50d4fa86
LE
1## @file\r
2#\r
3# Technical notes for the virtio-net driver.\r
4#\r
5# Copyright (C) 2013, Red Hat, Inc.\r
6#\r
7# This program and the accompanying materials are licensed and made available\r
8# under the terms and conditions of the BSD License which accompanies this\r
9# distribution. The full text of the license may be found at\r
10# http://opensource.org/licenses/bsd-license.php\r
11#\r
12# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS, WITHOUT\r
13# WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.\r
14#\r
15##\r
16\r
17Disclaimer\r
18----------\r
19\r
20All statements concerning standards and specifications are informative and not\r
21normative. They are made in good faith. Corrections are most welcome on the\r
22edk2-devel mailing list.\r
23\r
24The following documents have been perused while writing the driver and this\r
25document:\r
26- Unified Extensible Firmware Interface Specification, Version 2.3.1, Errata C;\r
27 June 27, 2012\r
28- Driver Writer's Guide for UEFI 2.3.1, 03/08/2012, Version 1.01;\r
29- Virtio PCI Card Specification, v0.9.5 DRAFT, 2012 May 7.\r
30\r
31\r
32Summary\r
33-------\r
34\r
35The VirtioNetDxe UEFI_DRIVER implements the Simple Network Protocol for\r
36virtio-net devices. Higher level protocols are automatically installed on top\r
37of it by the DXE Core / the ConnectController() boot service, enabling for\r
38virtio-net devices eg. DHCP configuration, TCP transfers with edk2 StdLib\r
39applications, and PXE booting in OVMF.\r
40\r
41\r
42UEFI driver structure\r
43---------------------\r
44\r
45A driver instance, belonging to a given virtio-net device, can be in one of\r
46four states at any time. The states stack up as follows below. The state\r
47transitions are labeled with the primary function (and its important callees\r
48faithfully indented) that implement the transition.\r
49\r
50 | ^\r
51 | |\r
52 [DriverBinding.c] | | [DriverBinding.c]\r
53 VirtioNetDriverBindingStart | | VirtioNetDriverBindingStop\r
54 VirtioNetSnpPopulate | | VirtioNetSnpEvacuate\r
55 VirtioNetGetFeatures | |\r
56 v |\r
57 +-------------------------+\r
58 | EfiSimpleNetworkStopped |\r
59 +-------------------------+\r
60 | ^\r
61 [SnpStart.c] | | [SnpStop.c]\r
62 VirtioNetStart | | VirtioNetStop\r
63 | |\r
64 v |\r
65 +-------------------------+\r
66 | EfiSimpleNetworkStarted |\r
67 +-------------------------+\r
68 | ^\r
69 [SnpInitialize.c] | | [SnpShutdown.c]\r
70 VirtioNetInitialize | | VirtioNetShutdown\r
71 VirtioNetInitRing {Rx, Tx} | | VirtioNetShutdownRx [SnpSharedHelpers.c]\r
72 VirtioRingInit | | VirtioNetShutdownTx [SnpSharedHelpers.c]\r
940baec0
BS
73 VirtioRingMap | | VirtioNetUninitRing [SnpSharedHelpers.c]\r
74 VirtioNetInitTx | | {Tx, Rx}\r
75 VirtioNetInitRx | | VirtIo->UnmapSharedBuffer\r
55dd5a67 76 | | VirtioRingUninit\r
50d4fa86
LE
77 v |\r
78 +-----------------------------+\r
79 | EfiSimpleNetworkInitialized |\r
80 +-----------------------------+\r
81\r
82The state at the top means "nonexistent" and is hence unnamed on the diagram --\r
83a driver instance actually doesn't exist at that point. The transition\r
84functions out of and into that state implement the Driver Binding Protocol.\r
85\r
86The lower three states characterize an existent driver instance and are all\r
87states defined by the Simple Network Protocol. The transition functions between\r
88them are member functions of the Simple Network Protocol.\r
89\r
90Each transition function validates its expected source state and its\r
91parameters. For example, VirtioNetDriverBindingStop will refuse to disconnect\r
92from the controller unless it's in EfiSimpleNetworkStopped.\r
93\r
94\r
95Driver instance states (Simple Network Protocol)\r
96------------------------------------------------\r
97\r
98In the EfiSimpleNetworkStopped state, the virtio-net device is (has been)\r
99re-set. No resources are allocated for networking / traffic purposes. The MAC\r
100address and other device attributes have been retrieved from the device (this\r
101is necessary for completing the VirtioNetDriverBindingStart transition).\r
102\r
103The EfiSimpleNetworkStarted is completely identical to the\r
104EfiSimpleNetworkStopped state for virtio-net, in the functional and\r
105resource-usage sense. This state is mandated / provided by the Simple Network\r
106Protocol for flexibility that the virtio-net driver doesn't exploit.\r
107\r
108In particular, the EfiSimpleNetworkStarted state is the target of the Shutdown\r
109SNP member function, and must therefore correspond to a hardware configuration\r
110where "[it] is safe for another driver to initialize". (Clearly another UEFI\r
111driver could not do that due to the exclusivity of the driver binding that\r
112VirtioNetDriverBindingStart() installs, but a later OS driver might qualify.)\r
113\r
114The EfiSimpleNetworkInitialized state is the live state of the virtio NIC / the\r
115driver instance. Virtio and other resources required for network traffic have\r
116been allocated, and the following SNP member functions are available (in\r
117addition to VirtioNetShutdown which leaves the state):\r
118\r
119- VirtioNetReceive [SnpReceive.c]: poll the virtio NIC for an Rx packet that\r
120 may have arrived asynchronously;\r
121\r
122- VirtioNetTransmit [SnpTransmit.c]: queue a Tx packet for asynchronous\r
123 transmission (meant to be used together with VirtioNetGetStatus);\r
124\r
125- VirtioNetGetStatus [SnpGetStatus.c]: query link status and status of pending\r
126 Tx packets;\r
127\r
128- VirtioNetMcastIpToMac [SnpMcastIpToMac.c]: transform a multicast IPv4/IPv6\r
129 address into a multicast MAC address;\r
130\r
131- VirtioNetReceiveFilters [SnpReceiveFilters.c]: emulate unicast / multicast /\r
132 broadcast filter configuration (not their actual effect -- a more liberal\r
133 filter setting than requested is allowed by the UEFI specification).\r
134\r
135The following SNP member functions are not supported [SnpUnsupported.c]:\r
136\r
137- VirtioNetReset: reinitialize the virtio NIC without shutting it down (a loop\r
138 from/to EfiSimpleNetworkInitialized);\r
139\r
140- VirtioNetStationAddress: assign a new MAC address to the virtio NIC,\r
141\r
142- VirtioNetStatistics: collect statistics,\r
143\r
144- VirtioNetNvData: access non-volatile data on the virtio NIC.\r
145\r
146Missing support for these functions is allowed by the UEFI specification and\r
147doesn't seem to trip up higher level protocols.\r
148\r
149\r
150Events and task priority levels\r
151-------------------------------\r
152\r
153The UEFI specification defines a sophisticated mechanism for asynchronous\r
154events / callbacks (see "6.1 Event, Timer, and Task Priority Services" for\r
155details). Such callbacks work like software interrupts, and some notion of\r
156locking / masking is important to implement critical sections (atomic or\r
157exclusive access to data or a device). This notion is defined as Task Priority\r
158Levels.\r
159\r
160The virtio-net driver for OVMF must concern itself with events for two reasons:\r
161\r
162- The Simple Network Protocol provides its clients with a (non-optional) WAIT\r
163 type event called WaitForPacket: it allows them to check or wait for Rx\r
164 packets by polling or blocking on this event. (This functionality overlaps\r
165 with the Receive member function.) The event is available to clients starting\r
166 with EfiSimpleNetworkStopped (inclusive).\r
167\r
168 The virtio-net driver is informed about such client polling or blockage by\r
169 receiving an asynchronous callback (a software interrupt). In the callback\r
170 function the driver must interrogate the driver instance state, and if it is\r
171 EfiSimpleNetworkInitialized, access the Rx queue and see if any packets are\r
172 available for consumption. If so, it must signal the WaitForPacket WAIT type\r
173 event, waking the client.\r
174\r
175 For simplicity and safety, all parts of the virtio-net driver that access any\r
176 bit of the driver instance (data or device) run at the TPL_CALLBACK level.\r
177 This is the highest level allowed for an SNP implementation, and all code\r
178 protected in this manner satisfies even stricter non-blocking requirements\r
179 than what's documented for TPL_CALLBACK.\r
180\r
181 The task priority level for the WaitForPacket callback too is set by the\r
182 driver, the choice is TPL_CALLBACK again. This in effect serializes the\r
183 WaitForPacket callback (VirtioNetIsPacketAvailable [Events.c]) with "normal"\r
184 parts of the driver.\r
185\r
186- According to the Driver Writer's Guide, a network driver should install a\r
187 callback function for the global EXIT_BOOT_SERVICES event (a special NOTIFY\r
188 type event). When the ExitBootServices() boot service has cleaned up internal\r
189 firmware state and is about to pass control to the OS, any network driver has\r
190 to stop any in-flight DMA transfers, lest it corrupts OS memory. For this\r
191 reason EXIT_BOOT_SERVICES is emitted and the network driver must abort\r
192 in-flight DMA transfers.\r
193\r
194 This callback (VirtioNetExitBoot) is synchronized with the rest of the driver\r
195 code just the same as explained for WaitForPacket. In\r
196 EfiSimpleNetworkInitialized state it resets the virtio NIC, halting all data\r
197 transfer. After the callback returns, no further driver code is expected to\r
198 be scheduled.\r
199\r
200\r
201Virtio internals -- Rx\r
202----------------------\r
203\r
204Requests (Rx and Tx alike) are always submitted by the guest and processed by\r
205the host. For Tx, processing means transmission. For Rx, processing means\r
206filling in the request with an incoming packet. Submitted requests exist on the\r
207"Available Ring", and answered (processed) requests show up on the "Used Ring".\r
208\r
209Packet data includes the media (Ethernet) header: destination MAC, source MAC,\r
210and Ethertype (14 bytes total).\r
211\r
212The following structures implement packet reception. Most of them are defined\r
213in the Virtio specification, the only driver-specific trait here is the static\r
214pre-configuration of the two-part descriptor chains, in VirtioNetInitRx. The\r
215diagram is simplified.\r
216\r
217 Available Index Available Index\r
218 last processed incremented\r
219 by the host by the guest\r
220 v -------> v\r
221Available +-------+-------+-------+-------+-------+\r
222Ring |DescIdx|DescIdx|DescIdx|DescIdx|DescIdx|\r
223 +-------+-------+-------+-------+-------+\r
224 =D6 =D2\r
225\r
226 D2 D3 D4 D5 D6 D7\r
227Descr. +----------+----------++----------+----------++----------+----------+\r
228Table |Adr:Len:Nx|Adr:Len:Nx||Adr:Len:Nx|Adr:Len:Nx||Adr:Len:Nx|Adr:Len:Nx|\r
229 +----------+----------++----------+----------++----------+----------+\r
230 =A2 =D3 =A3 =A4 =D5 =A5 =A6 =D7 =A7\r
231\r
232\r
233 A2 A3 A4 A5 A6 A7\r
234Receive +---------------+---------------+---------------+\r
235Destination |vnet hdr:packet|vnet hdr:packet|vnet hdr:packet|\r
236Area +---------------+---------------+---------------+\r
237\r
238 Used Index Used Index incremented\r
239 last processed by the guest by the host\r
240 v -------> v\r
241Used +-----------+-----------+-----------+-----------+-----------+\r
242Ring |DescIdx:Len|DescIdx:Len|DescIdx:Len|DescIdx:Len|DescIdx:Len|\r
243 +-----------+-----------+-----------+-----------+-----------+\r
244 =D4\r
245\r
246In VirtioNetInitRx, the guest allocates the fixed size Receive Destination\r
247Area, which accommodates all packets delivered asynchronously by the host. To\r
248each packet, a slice of this area is dedicated; each slice is further\r
249subdivided into virtio-net request header and network packet data. The\r
250(guest-physical) addresses of these sub-slices are denoted with A2, A3, A4 and\r
251so on. Importantly, an even-subscript "A" always belongs to a virtio-net\r
252request header, while an odd-subscript "A" always belongs to a packet\r
253sub-slice.\r
254\r
255Furthermore, the guest lays out a static pattern in the Descriptor Table. For\r
256each packet that can be in-flight or already arrived from the host,\r
257VirtioNetInitRx sets up a separate, two-part descriptor chain. For packet N,\r
258the Nth descriptor chain is set up as follows:\r
259\r
260- the first (=head) descriptor, with even index, points to the fixed-size\r
261 sub-slice receiving the virtio-net request header,\r
262\r
263- the second descriptor (with odd index) points to the fixed (1514 byte) size\r
264 sub-slice receiving the packet data,\r
265\r
266- a link from the first (head) descriptor in the chain is established to the\r
267 second (tail) descriptor in the chain.\r
268\r
269Finally, the guest populates the Available Ring with the indices of the head\r
270descriptors. All descriptor indices on both the Available Ring and the Used\r
271Ring are even.\r
272\r
273Packet reception occurs as follows:\r
274\r
275- The host consumes a descriptor index off the Available Ring. This index is\r
276 even (=2*N), and fingers the head descriptor of the chain belonging to packet\r
277 N.\r
278\r
279- The host reads the descriptors D(2*N) and -- following the Next link there\r
280 --- D(2*N+1), and stores the virtio-net request header at A(2*N), and the\r
281 packet data at A(2*N+1).\r
282\r
283- The host places the index of the head descriptor, 2*N, onto the Used Ring,\r
284 and sets the Len field in the same Used Ring Element to the total number of\r
285 bytes transferred for the entire descriptor chain. This enables the guest to\r
286 identify the length of Rx packets.\r
287\r
288- VirtioNetReceive polls the Used Ring. If a new Used Ring Element shows up, it\r
289 copies the data out to the caller, and recycles the index of the head\r
290 descriptor (ie. 2*N) to the Available Ring.\r
291\r
292- Because the host can process (answer) Rx requests in any order theoretically,\r
293 the order of head descriptor indices on each of the Available Ring and the\r
294 Used Ring is virtually random. (Except right after the initial population in\r
295 VirtioNetInitRx, when the Available Ring is full and increasing, and the Used\r
296 Ring is empty.)\r
297\r
298- If the Available Ring is empty, the host is forced to drop packets. If the\r
299 Used Ring is empty, VirtioNetReceive returns EFI_NOT_READY (no packet\r
300 available).\r
301\r
302\r
303Virtio internals -- Tx\r
304----------------------\r
305\r
306The transmission structure erected by VirtioNetInitTx is similar, it differs\r
307in the following:\r
308\r
309- There is no Receive Destination Area.\r
310\r
311- Each head descriptor, D(2*N), points to a read-only virtio-net request header\r
312 that is shared by all of the head descriptors. This virtio-net request header\r
313 is never modified by the host.\r
314\r
315- Each tail descriptor is re-pointed to the caller-supplied packet buffer\r
316 whenever VirtioNetTransmit places the corresponding head descriptor on the\r
317 Available Ring. The caller is responsible to hang on to the unmodified buffer\r
318 until it is reported transmitted by VirtioNetGetStatus.\r
319\r
320Steps of packet transmission:\r
321\r
322- Client code calls VirtioNetTransmit. VirtioNetTransmit tracks free descriptor\r
323 chains by keeping the indices of their head descriptors in a stack that is\r
324 private to the driver instance. All elements of the stack are even.\r
325\r
326- If the stack is empty (that is, each descriptor chain, in isolation, is\r
327 either pending transmission, or has been processed by the host but not\r
328 yet recycled by a VirtioNetGetStatus call), then VirtioNetTransmit returns\r
329 EFI_NOT_READY.\r
330\r
331- Otherwise the index of a free chain's head descriptor is popped from the\r
332 stack. The linked tail descriptor is re-pointed as discussed above. The head\r
333 descriptor's index is pushed on the Available Ring.\r
334\r
335- The host moves the head descriptor index from the Available Ring to the Used\r
336 Ring when it transmits the packet.\r
337\r
338- Client code calls VirtioNetGetStatus. In case the Used Ring is empty, the\r
339 function reports no Tx completion. Otherwise, a head descriptor's index is\r
340 consumed from the Used Ring and recycled to the private stack. The client\r
341 code's original packet buffer address is fetched from the tail descriptor\r
342 (where it has been stored at VirtioNetTransmit time) and returned to the\r
343 caller.\r
344\r
345- The Len field of the Used Ring Element is not checked. The host is assumed to\r
346 have transmitted the entire packet -- VirtioNetTransmit had forced it below\r
347 1514 bytes (inclusive). The Virtio specification suggests this packet size is\r
348 always accepted (and a lower MTU could be encountered on any later hop as\r
349 well). Additionally, there's no good way to report a short transmit via\r
350 VirtioNetGetStatus; EFI_DEVICE_ERROR seems too serious from the specification\r
351 and higher level protocols could interpret it as a fatal condition.\r
352\r
353- The host can theoretically reorder head descriptor indices when moving them\r
354 from the Available Ring to the Used Ring (out of order transmission). Because\r
355 of this (and the choice of a stack over a list for free descriptor chain\r
356 tracking) the order of head descriptor indices on either Ring is\r
357 unpredictable.\r