]> git.proxmox.com Git - ceph.git/blame - ceph/src/spdk/dpdk/doc/guides/nics/ixgbe.rst
update source to Ceph Pacific 16.2.2
[ceph.git] / ceph / src / spdk / dpdk / doc / guides / nics / ixgbe.rst
CommitLineData
11fdf7f2
TL
1.. SPDX-License-Identifier: BSD-3-Clause
2 Copyright(c) 2010-2016 Intel Corporation.
7c673cae
FG
3
4IXGBE Driver
5============
6
7Vector PMD for IXGBE
8--------------------
9
10Vector PMD uses Intel® SIMD instructions to optimize packet I/O.
11It improves load/store bandwidth efficiency of L1 data cache by using a wider SSE/AVX register 1 (1).
12The wider register gives space to hold multiple packet buffers so as to save instruction number when processing bulk of packets.
13
14There is no change to PMD API. The RX/TX handler are the only two entries for vPMD packet I/O.
15They are transparently registered at runtime RX/TX execution if all condition checks pass.
16
171. To date, only an SSE version of IX GBE vPMD is available.
7c673cae
FG
18
19Some constraints apply as pre-conditions for specific optimizations on bulk packet transfers.
20The following sections explain RX and TX constraints in the vPMD.
21
22RX Constraints
23~~~~~~~~~~~~~~
24
25Prerequisites and Pre-conditions
26^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
27
28The following prerequisites apply:
29
30* To enable vPMD to work for RX, bulk allocation for Rx must be allowed.
31
32Ensure that the following pre-conditions are satisfied:
33
34* rxq->rx_free_thresh >= RTE_PMD_IXGBE_RX_MAX_BURST
35
36* rxq->rx_free_thresh < rxq->nb_rx_desc
37
38* (rxq->nb_rx_desc % rxq->rx_free_thresh) == 0
39
40* rxq->nb_rx_desc < (IXGBE_MAX_RING_DESC - RTE_PMD_IXGBE_RX_MAX_BURST)
41
42These conditions are checked in the code.
43
44Scattered packets are not supported in this mode.
45If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB),
46vPMD for RX would be disabled.
47
48By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
49
50Feature not Supported by RX Vector PMD
51^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
52
53Some features are not supported when trying to increase the throughput in vPMD.
54They are:
55
56* IEEE1588
57
58* FDIR
59
60* Header split
61
62* RX checksum off load
63
64Other features are supported using optional MACRO configuration. They include:
65
66* HW VLAN strip
67
68* HW extend dual VLAN
69
11fdf7f2 70To guarantee the constraint, capabilities in dev_conf.rxmode.offloads will be checked:
7c673cae 71
11fdf7f2 72* DEV_RX_OFFLOAD_VLAN_STRIP
7c673cae 73
11fdf7f2 74* DEV_RX_OFFLOAD_VLAN_EXTEND
7c673cae 75
11fdf7f2 76* DEV_RX_OFFLOAD_CHECKSUM
7c673cae 77
11fdf7f2 78* DEV_RX_OFFLOAD_HEADER_SPLIT
7c673cae
FG
79
80* dev_conf
81
82fdir_conf->mode will also be checked.
83
f67539c2
TL
84VF Runtime Options
85^^^^^^^^^^^^^^^^^^
86
87The following ``devargs`` options can be enabled at runtime. They must
88be passed as part of EAL arguments. For example,
89
90.. code-block:: console
91
92 testpmd -w af:10.0,pflink_fullchk=1 -- -i
93
94- ``pflink_fullchk`` (default **0**)
95
96 When calling ``rte_eth_link_get_nowait()`` to get VF link status,
97 this option is used to control how VF synchronizes its status with
98 PF's. If set, VF will not only check the PF's physical link status
99 by reading related register, but also check the mailbox status. We
100 call this behavior as fully checking. And checking mailbox will
101 trigger PF's mailbox interrupt generation. If unset, the application
102 can get the VF's link status quickly by just reading the PF's link
103 status register, this will avoid the whole system's mailbox interrupt
104 generation.
105
106 ``rte_eth_link_get()`` will still use the mailbox method regardless
107 of the pflink_fullchk setting.
108
7c673cae
FG
109RX Burst Size
110^^^^^^^^^^^^^
111
112As vPMD is focused on high throughput, it assumes that the RX burst size is equal to or greater than 32 per burst.
113It returns zero if using nb_pkt < 32 as the expected packet number in the receive handler.
114
115TX Constraint
116~~~~~~~~~~~~~
117
118Prerequisite
119^^^^^^^^^^^^
120
121The only prerequisite is related to tx_rs_thresh.
122The tx_rs_thresh value must be greater than or equal to RTE_PMD_IXGBE_TX_MAX_BURST,
123but less or equal to RTE_IXGBE_TX_MAX_FREE_BUF_SZ.
124Consequently, by default the tx_rs_thresh value is in the range 32 to 64.
125
11fdf7f2 126Feature not Supported by TX Vector PMD
7c673cae
FG
127^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
128
11fdf7f2 129TX vPMD only works when offloads is set to 0
7c673cae 130
11fdf7f2 131This means that it does not support any TX offload.
7c673cae
FG
132
133Application Programming Interface
11fdf7f2 134---------------------------------
7c673cae
FG
135
136In DPDK release v16.11 an API for ixgbe specific functions has been added to the ixgbe PMD.
137The declarations for the API functions are in the header ``rte_pmd_ixgbe.h``.
138
139Sample Application Notes
11fdf7f2 140------------------------
7c673cae
FG
141
142l3fwd
11fdf7f2 143~~~~~
7c673cae
FG
144
145When running l3fwd with vPMD, there is one thing to note.
11fdf7f2 146In the configuration, ensure that DEV_RX_OFFLOAD_CHECKSUM in port_conf.rxmode.offloads is NOT set.
7c673cae
FG
147Otherwise, by default, RX vPMD is disabled.
148
149load_balancer
11fdf7f2 150~~~~~~~~~~~~~
7c673cae 151
11fdf7f2 152As in the case of l3fwd, to enable vPMD, do NOT set DEV_RX_OFFLOAD_CHECKSUM in port_conf.rxmode.offloads.
7c673cae
FG
153In addition, for improved performance, use -bsz "(32,32),(64,64),(32,32)" in load_balancer to avoid using the default burst size of 144.
154
155
11fdf7f2
TL
156Limitations or Known issues
157---------------------------
158
7c673cae 159Malicious Driver Detection not Supported
11fdf7f2 160~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7c673cae
FG
161
162The Intel x550 series NICs support a feature called MDD (Malicious
163Driver Detection) which checks the behavior of the VF driver.
164If this feature is enabled, the VF must use the advanced context descriptor
165correctly and set the CC (Check Context) bit.
166DPDK PF doesn't support MDD, but kernel PF does. We may hit problem in this
167scenario kernel PF + DPDK VF. If user enables MDD in kernel PF, DPDK VF will
168not work. Because kernel PF thinks the VF is malicious. But actually it's not.
169The only reason is the VF doesn't act as MDD required.
170There's significant performance impact to support MDD. DPDK should check if
171the advanced context descriptor should be set and set it. And DPDK has to ask
172the info about the header length from the upper layer, because parsing the
173packet itself is not acceptable. So, it's too expensive to support MDD.
11fdf7f2
TL
174When using kernel PF + DPDK VF on x550, please make sure to use a kernel
175PF driver that disables MDD or can disable MDD.
176
177Some kernel drivers already disable MDD by default while some kernels can use
178the command ``insmod ixgbe.ko MDD=0,0`` to disable MDD. Each "0" in the
179command refers to a port. For example, if there are 6 ixgbe ports, the command
180should be changed to ``insmod ixgbe.ko MDD=0,0,0,0,0,0``.
7c673cae
FG
181
182
183Statistics
11fdf7f2 184~~~~~~~~~~
7c673cae
FG
185
186The statistics of ixgbe hardware must be polled regularly in order for it to
187remain consistent. Running a DPDK application without polling the statistics will
188cause registers on hardware to count to the maximum value, and "stick" at
189that value.
190
191In order to avoid statistic registers every reaching the maximum value,
192read the statistics from the hardware using ``rte_eth_stats_get()`` or
193``rte_eth_xstats_get()``.
194
195The maximum time between statistics polls that ensures consistent results can
196be calculated as follows:
197
198.. code-block:: c
199
200 max_read_interval = UINT_MAX / max_packets_per_second
201 max_read_interval = 4294967295 / 14880952
202 max_read_interval = 288.6218096127183 (seconds)
203 max_read_interval = ~4 mins 48 sec.
204
205In order to ensure valid results, it is recommended to poll every 4 minutes.
206
11fdf7f2
TL
207MTU setting
208~~~~~~~~~~~
209
210Although the user can set the MTU separately on PF and VF ports, the ixgbe NIC
211only supports one global MTU per physical port.
212So when the user sets different MTUs on PF and VF ports in one physical port,
213the real MTU for all these PF and VF ports is the largest value set.
214This behavior is based on the kernel driver behavior.
215
216VF MAC address setting
217~~~~~~~~~~~~~~~~~~~~~~
218
219On ixgbe, the concept of "pool" can be used for different things depending on
220the mode. In VMDq mode, "pool" means a VMDq pool. In IOV mode, "pool" means a
221VF.
222
223There is no RTE API to add a VF's MAC address from the PF. On ixgbe, the
224``rte_eth_dev_mac_addr_add()`` function can be used to add a VF's MAC address,
225as a workaround.
226
9f95a23c
TL
227X550 does not support legacy interrupt mode
228~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229
230Description
231^^^^^^^^^^^
232X550 cannot get interrupts if using ``uio_pci_generic`` module or using legacy
233interrupt mode of ``igb_uio`` or ``vfio``. Because the errata of X550 states
234that the Interrupt Status bit is not implemented. The errata is the item #22
235from `X550 spec update <https://www.intel.com/content/dam/www/public/us/en/
236documents/specification-updates/ethernet-x550-spec-update.pdf>`_
237
238Implication
239^^^^^^^^^^^
240When using ``uio_pci_generic`` module or using legacy interrupt mode of
241``igb_uio`` or ``vfio``, the Interrupt Status bit would be checked if the
242interrupt is coming. Since the bit is not implemented in X550, the irq cannot
243be handled correctly and cannot report the event fd to DPDK apps. Then apps
244cannot get interrupts and ``dmesg`` will show messages like ``irq #No.: ``
245``nobody cared.``
246
247Workaround
248^^^^^^^^^^
249Do not bind the ``uio_pci_generic`` module in X550 NICs.
250Do not bind ``igb_uio`` with legacy mode in X550 NICs.
251Before binding ``vfio`` with legacy mode in X550 NICs, use ``modprobe vfio ``
252``nointxmask=1`` to load ``vfio`` module if the intx is not shared with other
253devices.
11fdf7f2
TL
254
255Inline crypto processing support
256--------------------------------
257
258Inline IPsec processing is supported for ``RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO``
259mode for ESP packets only:
260
261- ESP authentication only: AES-128-GMAC (128-bit key)
262- ESP encryption and authentication: AES-128-GCM (128-bit key)
263
264IPsec Security Gateway Sample Application supports inline IPsec processing for
265ixgbe PMD.
266
267For more details see the IPsec Security Gateway Sample Application and Security
268library documentation.
269
270
271Virtual Function Port Representors
272----------------------------------
273The IXGBE PF PMD supports the creation of VF port representors for the control
274and monitoring of IXGBE virtual function devices. Each port representor
275corresponds to a single virtual function of that device. Using the ``devargs``
276option ``representor`` the user can specify which virtual functions to create
277port representors for on initialization of the PF PMD by passing the VF IDs of
278the VFs which are required.::
279
280 -w DBDF,representor=[0,1,4]
281
282Currently hot-plugging of representor ports is not supported so all required
283representors must be specified on the creation of the PF.
7c673cae
FG
284
285Supported Chipsets and NICs
286---------------------------
287
288- Intel 82599EB 10 Gigabit Ethernet Controller
289- Intel 82598EB 10 Gigabit Ethernet Controller
290- Intel 82599ES 10 Gigabit Ethernet Controller
291- Intel 82599EN 10 Gigabit Ethernet Controller
292- Intel Ethernet Controller X540-AT2
293- Intel Ethernet Controller X550-BT2
294- Intel Ethernet Controller X550-AT2
295- Intel Ethernet Controller X550-AT
296- Intel Ethernet Converged Network Adapter X520-SR1
297- Intel Ethernet Converged Network Adapter X520-SR2
298- Intel Ethernet Converged Network Adapter X520-LR1
299- Intel Ethernet Converged Network Adapter X520-DA1
300- Intel Ethernet Converged Network Adapter X520-DA2
301- Intel Ethernet Converged Network Adapter X520-DA4
302- Intel Ethernet Converged Network Adapter X520-QDA1
303- Intel Ethernet Converged Network Adapter X520-T2
304- Intel 10 Gigabit AF DA Dual Port Server Adapter
305- Intel 10 Gigabit AT Server Adapter
306- Intel 10 Gigabit AT2 Server Adapter
307- Intel 10 Gigabit CX4 Dual Port Server Adapter
308- Intel 10 Gigabit XF LR Server Adapter
309- Intel 10 Gigabit XF SR Dual Port Server Adapter
310- Intel 10 Gigabit XF SR Server Adapter
311- Intel Ethernet Converged Network Adapter X540-T1
312- Intel Ethernet Converged Network Adapter X540-T2
313- Intel Ethernet Converged Network Adapter X550-T1
314- Intel Ethernet Converged Network Adapter X550-T2