]>
Commit | Line | Data |
---|---|---|
9f95a23c TL |
1 | .. SPDX-License-Identifier: BSD-3-Clause |
2 | Copyright(c) 2017 Intel Corporation. | |
3 | ||
4 | Generic Segmentation Offload Library | |
5 | ==================================== | |
6 | ||
7 | Overview | |
8 | -------- | |
9 | Generic Segmentation Offload (GSO) is a widely used software implementation of | |
10 | TCP Segmentation Offload (TSO), which reduces per-packet processing overhead. | |
11 | Much like TSO, GSO gains performance by enabling upper layer applications to | |
12 | process a smaller number of large packets (e.g. MTU size of 64KB), instead of | |
13 | processing higher numbers of small packets (e.g. MTU size of 1500B), thus | |
14 | reducing per-packet overhead. | |
15 | ||
16 | For example, GSO allows guest kernel stacks to transmit over-sized TCP segments | |
17 | that far exceed the kernel interface's MTU; this eliminates the need to segment | |
18 | packets within the guest, and improves the data-to-overhead ratio of both the | |
19 | guest-host link, and PCI bus. The expectation of the guest network stack in this | |
20 | scenario is that segmentation of egress frames will take place either in the NIC | |
21 | HW, or where that hardware capability is unavailable, either in the host | |
22 | application, or network stack. | |
23 | ||
24 | Bearing that in mind, the GSO library enables DPDK applications to segment | |
25 | packets in software. Note however, that GSO is implemented as a standalone | |
26 | library, and not via a 'fallback' mechanism (i.e. for when TSO is unsupported | |
27 | in the underlying hardware); that is, applications must explicitly invoke the | |
28 | GSO library to segment packets. The size of GSO segments ``(segsz)`` is | |
29 | configurable by the application. | |
30 | ||
31 | Limitations | |
32 | ----------- | |
33 | ||
34 | #. The GSO library doesn't check if input packets have correct checksums. | |
35 | ||
36 | #. In addition, the GSO library doesn't re-calculate checksums for segmented | |
37 | packets (that task is left to the application). | |
38 | ||
39 | #. IP fragments are unsupported by the GSO library. | |
40 | ||
41 | #. The egress interface's driver must support multi-segment packets. | |
42 | ||
43 | #. Currently, the GSO library supports the following IPv4 packet types: | |
44 | ||
45 | - TCP | |
46 | - UDP | |
47 | - VxLAN | |
48 | - GRE | |
49 | ||
50 | See `Supported GSO Packet Types`_ for further details. | |
51 | ||
52 | Packet Segmentation | |
53 | ------------------- | |
54 | ||
55 | The ``rte_gso_segment()`` function is the GSO library's primary | |
56 | segmentation API. | |
57 | ||
58 | Before performing segmentation, an application must create a GSO context object | |
59 | ``(struct rte_gso_ctx)``, which provides the library with some of the | |
60 | information required to understand how the packet should be segmented. Refer to | |
61 | `How to Segment a Packet`_ for additional details on same. Once the GSO context | |
62 | has been created, and populated, the application can then use the | |
63 | ``rte_gso_segment()`` function to segment packets. | |
64 | ||
65 | The GSO library typically stores each segment that it creates in two parts: the | |
66 | first part contains a copy of the original packet's headers, while the second | |
67 | part contains a pointer to an offset within the original packet. This mechanism | |
68 | is explained in more detail in `GSO Output Segment Format`_. | |
69 | ||
70 | The GSO library supports both single- and multi-segment input mbufs. | |
71 | ||
72 | GSO Output Segment Format | |
73 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | |
74 | To reduce the number of expensive memcpy operations required when segmenting a | |
75 | packet, the GSO library typically stores each segment that it creates as a | |
76 | two-part mbuf (technically, this is termed a 'two-segment' mbuf; however, since | |
77 | the elements produced by the API are also called 'segments', for clarity the | |
78 | term 'part' is used here instead). | |
79 | ||
80 | The first part of each output segment is a direct mbuf and contains a copy of | |
81 | the original packet's headers, which must be prepended to each output segment. | |
82 | These headers are copied from the original packet into each output segment. | |
83 | ||
84 | The second part of each output segment, represents a section of data from the | |
85 | original packet, i.e. a data segment. Rather than copy the data directly from | |
86 | the original packet into the output segment (which would impact performance | |
87 | considerably), the second part of each output segment is an indirect mbuf, | |
88 | which contains no actual data, but simply points to an offset within the | |
89 | original packet. | |
90 | ||
91 | The combination of the 'header' segment and the 'data' segment constitutes a | |
92 | single logical output GSO segment of the original packet. This is illustrated | |
93 | in :numref:`figure_gso-output-segment-format`. | |
94 | ||
95 | .. _figure_gso-output-segment-format: | |
96 | ||
97 | .. figure:: img/gso-output-segment-format.* | |
98 | :align: center | |
99 | ||
100 | Two-part GSO output segment | |
101 | ||
102 | In one situation, the output segment may contain additional 'data' segments. | |
103 | This only occurs when: | |
104 | ||
105 | - the input packet on which GSO is to be performed is represented by a | |
106 | multi-segment mbuf. | |
107 | ||
108 | - the output segment is required to contain data that spans the boundaries | |
109 | between segments of the input multi-segment mbuf. | |
110 | ||
111 | The GSO library traverses each segment of the input packet, and produces | |
112 | numerous output segments; for optimal performance, the number of output | |
113 | segments is kept to a minimum. Consequently, the GSO library maximizes the | |
114 | amount of data contained within each output segment; i.e. each output segment | |
115 | ``segsz`` bytes of data. The only exception to this is in the case of the very | |
116 | final output segment; if ``pkt_len`` % ``segsz``, then the final segment is | |
117 | smaller than the rest. | |
118 | ||
119 | In order for an output segment to meet its MSS, it may need to include data from | |
120 | multiple input segments. Due to the nature of indirect mbufs (each indirect mbuf | |
121 | can point to only one direct mbuf), the solution here is to add another indirect | |
122 | mbuf to the output segment; this additional segment then points to the next | |
123 | input segment. If necessary, this chaining process is repeated, until the sum of | |
124 | all of the data 'contained' in the output segment reaches ``segsz``. This | |
125 | ensures that the amount of data contained within each output segment is uniform, | |
126 | with the possible exception of the last segment, as previously described. | |
127 | ||
128 | :numref:`figure_gso-three-seg-mbuf` illustrates an example of a three-part | |
129 | output segment. In this example, the output segment needs to include data from | |
130 | the end of one input segment, and the beginning of another. To achieve this, | |
131 | an additional indirect mbuf is chained to the second part of the output segment, | |
132 | and is attached to the next input segment (i.e. it points to the data in the | |
133 | next input segment). | |
134 | ||
135 | .. _figure_gso-three-seg-mbuf: | |
136 | ||
137 | .. figure:: img/gso-three-seg-mbuf.* | |
138 | :align: center | |
139 | ||
140 | Three-part GSO output segment | |
141 | ||
142 | Supported GSO Packet Types | |
143 | -------------------------- | |
144 | ||
145 | TCP/IPv4 GSO | |
146 | ~~~~~~~~~~~~ | |
147 | TCP/IPv4 GSO supports segmentation of suitably large TCP/IPv4 packets, which | |
148 | may also contain an optional VLAN tag. | |
149 | ||
150 | UDP/IPv4 GSO | |
151 | ~~~~~~~~~~~~ | |
152 | UDP/IPv4 GSO supports segmentation of suitably large UDP/IPv4 packets, which | |
153 | may also contain an optional VLAN tag. UDP GSO is the same as IP fragmentation. | |
154 | Specifically, UDP GSO treats the UDP header as a part of the payload and | |
155 | does not modify it during segmentation. Therefore, after UDP GSO, only the | |
156 | first output packet has the original UDP header, and others just have l2 | |
157 | and l3 headers. | |
158 | ||
159 | VxLAN GSO | |
160 | ~~~~~~~~~ | |
161 | VxLAN packets GSO supports segmentation of suitably large VxLAN packets, | |
162 | which contain an outer IPv4 header, inner TCP/IPv4 headers, and optional | |
163 | inner and/or outer VLAN tag(s). | |
164 | ||
165 | GRE GSO | |
166 | ~~~~~~~ | |
167 | GRE GSO supports segmentation of suitably large GRE packets, which contain | |
168 | an outer IPv4 header, inner TCP/IPv4 headers, and an optional VLAN tag. | |
169 | ||
170 | How to Segment a Packet | |
171 | ----------------------- | |
172 | ||
173 | To segment an outgoing packet, an application must: | |
174 | ||
175 | #. First create a GSO context ``(struct rte_gso_ctx)``; this contains: | |
176 | ||
177 | - a pointer to the mbuf pool for allocating the direct buffers, which are | |
178 | used to store the GSO segments' packet headers. | |
179 | ||
180 | - a pointer to the mbuf pool for allocating indirect buffers, which are | |
181 | used to locate GSO segments' packet payloads. | |
182 | ||
183 | .. note:: | |
184 | ||
185 | An application may use the same pool for both direct and indirect | |
186 | buffers. However, since indirect mbufs simply store a pointer, the | |
187 | application may reduce its memory consumption by creating a separate memory | |
188 | pool, containing smaller elements, for the indirect pool. | |
189 | ||
190 | ||
191 | - the size of each output segment, including packet headers and payload, | |
192 | measured in bytes. | |
193 | ||
194 | - the bit mask of required GSO types. The GSO library uses the same macros as | |
195 | those that describe a physical device's TX offloading capabilities (i.e. | |
196 | ``DEV_TX_OFFLOAD_*_TSO``) for gso_types. For example, if an application | |
197 | wants to segment TCP/IPv4 packets, it should set gso_types to | |
198 | ``DEV_TX_OFFLOAD_TCP_TSO``. The only other supported values currently | |
199 | supported for gso_types are ``DEV_TX_OFFLOAD_VXLAN_TNL_TSO``, and | |
200 | ``DEV_TX_OFFLOAD_GRE_TNL_TSO``; a combination of these macros is also | |
201 | allowed. | |
202 | ||
203 | - a flag, that indicates whether the IPv4 headers of output segments should | |
204 | contain fixed or incremental ID values. | |
205 | ||
206 | 2. Set the appropriate ol_flags in the mbuf. | |
207 | ||
208 | - The GSO library use the value of an mbuf's ``ol_flags`` attribute to | |
209 | to determine how a packet should be segmented. It is the application's | |
210 | responsibility to ensure that these flags are set. | |
211 | ||
212 | - For example, in order to segment TCP/IPv4 packets, the application should | |
213 | add the ``PKT_TX_IPV4`` and ``PKT_TX_TCP_SEG`` flags to the mbuf's | |
214 | ol_flags. | |
215 | ||
216 | - If checksum calculation in hardware is required, the application should | |
217 | also add the ``PKT_TX_TCP_CKSUM`` and ``PKT_TX_IP_CKSUM`` flags. | |
218 | ||
219 | #. Check if the packet should be processed. Packets with one of the | |
220 | following properties are not processed and are returned immediately: | |
221 | ||
222 | - Packet length is less than ``segsz`` (i.e. GSO is not required). | |
223 | ||
224 | - Packet type is not supported by GSO library (see | |
225 | `Supported GSO Packet Types`_). | |
226 | ||
227 | - Application has not enabled GSO support for the packet type. | |
228 | ||
229 | - Packet's ol_flags have been incorrectly set. | |
230 | ||
231 | #. Allocate space in which to store the output GSO segments. If the amount of | |
232 | space allocated by the application is insufficient, segmentation will fail. | |
233 | ||
234 | #. Invoke the GSO segmentation API, ``rte_gso_segment()``. | |
235 | ||
236 | #. If required, update the L3 and L4 checksums of the newly-created segments. | |
237 | For tunneled packets, the outer IPv4 headers' checksums should also be | |
238 | updated. Alternatively, the application may offload checksum calculation | |
239 | to HW. | |
240 |