]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. BSD LICENSE |
2 | Copyright 2015 6WIND S.A. | |
3 | ||
4 | Redistribution and use in source and binary forms, with or without | |
5 | modification, are permitted provided that the following conditions | |
6 | are met: | |
7 | ||
8 | * Redistributions of source code must retain the above copyright | |
9 | notice, this list of conditions and the following disclaimer. | |
10 | * Redistributions in binary form must reproduce the above copyright | |
11 | notice, this list of conditions and the following disclaimer in | |
12 | the documentation and/or other materials provided with the | |
13 | distribution. | |
14 | * Neither the name of 6WIND S.A. nor the names of its | |
15 | contributors may be used to endorse or promote products derived | |
16 | from this software without specific prior written permission. | |
17 | ||
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
19 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
20 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
21 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | |
22 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | |
23 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | |
24 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
25 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | |
26 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
27 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
29 | ||
30 | MLX5 poll mode driver | |
31 | ===================== | |
32 | ||
33 | The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for | |
34 | **Mellanox ConnectX-4** and **Mellanox ConnectX-4 Lx** families of | |
35 | 10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in | |
36 | SR-IOV context. | |
37 | ||
38 | Information and documentation about these adapters can be found on the | |
39 | `Mellanox website <http://www.mellanox.com>`__. Help is also provided by the | |
40 | `Mellanox community <http://community.mellanox.com/welcome>`__. | |
41 | ||
42 | There is also a `section dedicated to this poll mode driver | |
43 | <http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__. | |
44 | ||
45 | .. note:: | |
46 | ||
47 | Due to external dependencies, this driver is disabled by default. It must | |
48 | be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and | |
49 | recompiling DPDK. | |
50 | ||
51 | Implementation details | |
52 | ---------------------- | |
53 | ||
54 | Besides its dependency on libibverbs (that implies libmlx5 and associated | |
55 | kernel support), librte_pmd_mlx5 relies heavily on system calls for control | |
56 | operations such as querying/updating the MTU and flow control parameters. | |
57 | ||
58 | For security reasons and robustness, this driver only deals with virtual | |
59 | memory addresses. The way resources allocations are handled by the kernel | |
60 | combined with hardware specifications that allow it to handle virtual memory | |
61 | addresses directly ensure that DPDK applications cannot access random | |
62 | physical memory (or memory that does not belong to the current process). | |
63 | ||
64 | This capability allows the PMD to coexist with kernel network interfaces | |
65 | which remain functional, although they stop receiving unicast packets as | |
66 | long as they share the same MAC address. | |
67 | ||
68 | Enabling librte_pmd_mlx5 causes DPDK applications to be linked against | |
69 | libibverbs. | |
70 | ||
71 | Features | |
72 | -------- | |
73 | ||
74 | - Multiple TX and RX queues. | |
75 | - Support for scattered TX and RX frames. | |
76 | - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. | |
77 | - Several RSS hash keys, one for each flow type. | |
78 | - Configurable RETA table. | |
79 | - Support for multiple MAC addresses. | |
80 | - VLAN filtering. | |
81 | - RX VLAN stripping. | |
82 | - TX VLAN insertion. | |
83 | - RX CRC stripping configuration. | |
84 | - Promiscuous mode. | |
85 | - Multicast promiscuous mode. | |
86 | - Hardware checksum offloads. | |
87 | - Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and | |
88 | RTE_ETH_FDIR_REJECT). | |
89 | - Secondary process TX is supported. | |
90 | - KVM and VMware ESX SR-IOV modes are supported. | |
91 | - RSS hash result is supported. | |
92 | ||
93 | Limitations | |
94 | ----------- | |
95 | ||
96 | - Inner RSS for VXLAN frames is not supported yet. | |
97 | - Port statistics through software counters only. | |
98 | - Hardware checksum offloads for VXLAN inner header are not supported yet. | |
99 | - Secondary process RX is not supported. | |
100 | ||
101 | Configuration | |
102 | ------------- | |
103 | ||
104 | Compilation options | |
105 | ~~~~~~~~~~~~~~~~~~~ | |
106 | ||
107 | These options can be modified in the ``.config`` file. | |
108 | ||
109 | - ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**) | |
110 | ||
111 | Toggle compilation of librte_pmd_mlx5 itself. | |
112 | ||
113 | - ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**) | |
114 | ||
115 | Toggle debugging code and stricter compilation flags. Enabling this option | |
116 | adds additional run-time checks and debugging messages at the cost of | |
117 | lower performance. | |
118 | ||
119 | - ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**) | |
120 | ||
121 | Maximum number of cached memory pools (MPs) per TX queue. Each MP from | |
122 | which buffers are to be transmitted must be associated to memory regions | |
123 | (MRs). This is a slow operation that must be cached. | |
124 | ||
125 | This value is always 1 for RX queues since they use a single MP. | |
126 | ||
127 | Environment variables | |
128 | ~~~~~~~~~~~~~~~~~~~~~ | |
129 | ||
130 | - ``MLX5_PMD_ENABLE_PADDING`` | |
131 | ||
132 | Enables HW packet padding in PCI bus transactions. | |
133 | ||
134 | When packet size is cache aligned and CRC stripping is enabled, 4 fewer | |
135 | bytes are written to the PCI bus. Enabling padding makes such packets | |
136 | aligned again. | |
137 | ||
138 | In cases where PCI bandwidth is the bottleneck, padding can improve | |
139 | performance by 10%. | |
140 | ||
141 | This is disabled by default since this can also decrease performance for | |
142 | unaligned packet sizes. | |
143 | ||
144 | Run-time configuration | |
145 | ~~~~~~~~~~~~~~~~~~~~~~ | |
146 | ||
147 | - librte_pmd_mlx5 brings kernel network interfaces up during initialization | |
148 | because it is affected by their state. Forcing them down prevents packets | |
149 | reception. | |
150 | ||
151 | - **ethtool** operations on related kernel interfaces also affect the PMD. | |
152 | ||
153 | - ``rxq_cqe_comp_en`` parameter [int] | |
154 | ||
155 | A nonzero value enables the compression of CQE on RX side. This feature | |
156 | allows to save PCI bandwidth and improve performance at the cost of a | |
157 | slightly higher CPU usage. Enabled by default. | |
158 | ||
159 | Supported on: | |
160 | ||
161 | - x86_64 with ConnectX4 and ConnectX4 LX | |
162 | - Power8 with ConnectX4 LX | |
163 | ||
164 | - ``txq_inline`` parameter [int] | |
165 | ||
166 | Amount of data to be inlined during TX operations. Improves latency. | |
167 | Can improve PPS performance when PCI back pressure is detected and may be | |
168 | useful for scenarios involving heavy traffic on many queues. | |
169 | ||
170 | It is not enabled by default (set to 0) since the additional software | |
171 | logic necessary to handle this mode can lower performance when back | |
172 | pressure is not expected. | |
173 | ||
174 | - ``txqs_min_inline`` parameter [int] | |
175 | ||
176 | Enable inline send only when the number of TX queues is greater or equal | |
177 | to this value. | |
178 | ||
179 | This option should be used in combination with ``txq_inline`` above. | |
180 | ||
181 | - ``txq_mpw_en`` parameter [int] | |
182 | ||
183 | A nonzero value enables multi-packet send. This feature allows the TX | |
184 | burst function to pack up to five packets in two descriptors in order to | |
185 | save PCI bandwidth and improve performance at the cost of a slightly | |
186 | higher CPU usage. | |
187 | ||
188 | It is currently only supported on the ConnectX-4 Lx family of adapters. | |
189 | Enabled by default. | |
190 | ||
191 | Prerequisites | |
192 | ------------- | |
193 | ||
194 | This driver relies on external libraries and kernel drivers for resources | |
195 | allocations and initialization. The following dependencies are not part of | |
196 | DPDK and must be installed separately: | |
197 | ||
198 | - **libibverbs** | |
199 | ||
200 | User space Verbs framework used by librte_pmd_mlx5. This library provides | |
201 | a generic interface between the kernel and low-level user space drivers | |
202 | such as libmlx5. | |
203 | ||
204 | It allows slow and privileged operations (context initialization, hardware | |
205 | resources allocations) to be managed by the kernel and fast operations to | |
206 | never leave user space. | |
207 | ||
208 | - **libmlx5** | |
209 | ||
210 | Low-level user space driver library for Mellanox ConnectX-4 devices, | |
211 | it is automatically loaded by libibverbs. | |
212 | ||
213 | This library basically implements send/receive calls to the hardware | |
214 | queues. | |
215 | ||
216 | - **Kernel modules** (mlnx-ofed-kernel) | |
217 | ||
218 | They provide the kernel-side Verbs API and low level device drivers that | |
219 | manage actual hardware initialization and resources sharing with user | |
220 | space processes. | |
221 | ||
222 | Unlike most other PMDs, these modules must remain loaded and bound to | |
223 | their devices: | |
224 | ||
225 | - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and | |
226 | related Ethernet kernel network devices. | |
227 | - mlx5_ib: InifiniBand device driver. | |
228 | - ib_uverbs: user space driver for Verbs (entry point for libibverbs). | |
229 | ||
230 | - **Firmware update** | |
231 | ||
232 | Mellanox OFED releases include firmware updates for ConnectX-4 adapters. | |
233 | ||
234 | Because each release provides new features, these updates must be applied to | |
235 | match the kernel modules and libraries they come with. | |
236 | ||
237 | .. note:: | |
238 | ||
239 | Both libraries are BSD and GPL licensed. Linux kernel modules are GPL | |
240 | licensed. | |
241 | ||
242 | Currently supported by DPDK: | |
243 | ||
244 | - Mellanox OFED **3.4-1.0.0.0**. | |
245 | ||
246 | - firmware version: | |
247 | ||
248 | - ConnectX-4: **12.17.1010** | |
249 | - ConnectX-4 Lx: **14.17.1010** | |
250 | ||
251 | Getting Mellanox OFED | |
252 | ~~~~~~~~~~~~~~~~~~~~~ | |
253 | ||
254 | While these libraries and kernel modules are available on OpenFabrics | |
255 | Alliance's `website <https://www.openfabrics.org/>`__ and provided by package | |
256 | managers on most distributions, this PMD requires Ethernet extensions that | |
257 | may not be supported at the moment (this is a work in progress). | |
258 | ||
259 | `Mellanox OFED | |
260 | <http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__ | |
261 | includes the necessary support and should be used in the meantime. For DPDK, | |
262 | only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are | |
263 | required from that distribution. | |
264 | ||
265 | .. note:: | |
266 | ||
267 | Several versions of Mellanox OFED are available. Installing the version | |
268 | this DPDK release was developed and tested against is strongly | |
269 | recommended. Please check the `prerequisites`_. | |
270 | ||
271 | Notes for testpmd | |
272 | ----------------- | |
273 | ||
274 | Compared to librte_pmd_mlx4 that implements a single RSS configuration per | |
275 | port, librte_pmd_mlx5 supports per-protocol RSS configuration. | |
276 | ||
277 | Since ``testpmd`` defaults to IP RSS mode and there is currently no | |
278 | command-line parameter to enable additional protocols (UDP and TCP as well | |
279 | as IP), the following commands must be entered from its CLI to get the same | |
280 | behavior as librte_pmd_mlx4: | |
281 | ||
282 | .. code-block:: console | |
283 | ||
284 | > port stop all | |
285 | > port config all rss all | |
286 | > port start all | |
287 | ||
288 | Usage example | |
289 | ------------- | |
290 | ||
291 | This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4 | |
292 | devices managed by librte_pmd_mlx5. | |
293 | ||
294 | #. Load the kernel modules: | |
295 | ||
296 | .. code-block:: console | |
297 | ||
298 | modprobe -a ib_uverbs mlx5_core mlx5_ib | |
299 | ||
300 | Alternatively if MLNX_OFED is fully installed, the following script can | |
301 | be run: | |
302 | ||
303 | .. code-block:: console | |
304 | ||
305 | /etc/init.d/openibd restart | |
306 | ||
307 | .. note:: | |
308 | ||
309 | User space I/O kernel modules (uio and igb_uio) are not used and do | |
310 | not have to be loaded. | |
311 | ||
312 | #. Make sure Ethernet interfaces are in working order and linked to kernel | |
313 | verbs. Related sysfs entries should be present: | |
314 | ||
315 | .. code-block:: console | |
316 | ||
317 | ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 | |
318 | ||
319 | Example output: | |
320 | ||
321 | .. code-block:: console | |
322 | ||
323 | eth30 | |
324 | eth31 | |
325 | eth32 | |
326 | eth33 | |
327 | ||
328 | #. Optionally, retrieve their PCI bus addresses for whitelisting: | |
329 | ||
330 | .. code-block:: console | |
331 | ||
332 | { | |
333 | for intf in eth2 eth3 eth4 eth5; | |
334 | do | |
335 | (cd "/sys/class/net/${intf}/device/" && pwd -P); | |
336 | done; | |
337 | } | | |
338 | sed -n 's,.*/\(.*\),-w \1,p' | |
339 | ||
340 | Example output: | |
341 | ||
342 | .. code-block:: console | |
343 | ||
344 | -w 0000:05:00.1 | |
345 | -w 0000:06:00.0 | |
346 | -w 0000:06:00.1 | |
347 | -w 0000:05:00.0 | |
348 | ||
349 | #. Request huge pages: | |
350 | ||
351 | .. code-block:: console | |
352 | ||
353 | echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages | |
354 | ||
355 | #. Start testpmd with basic parameters: | |
356 | ||
357 | .. code-block:: console | |
358 | ||
359 | testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i | |
360 | ||
361 | Example output: | |
362 | ||
363 | .. code-block:: console | |
364 | ||
365 | [...] | |
366 | EAL: PCI device 0000:05:00.0 on NUMA socket 0 | |
367 | EAL: probe driver: 15b3:1013 librte_pmd_mlx5 | |
368 | PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false) | |
369 | PMD: librte_pmd_mlx5: 1 port(s) detected | |
370 | PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe | |
371 | EAL: PCI device 0000:05:00.1 on NUMA socket 0 | |
372 | EAL: probe driver: 15b3:1013 librte_pmd_mlx5 | |
373 | PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false) | |
374 | PMD: librte_pmd_mlx5: 1 port(s) detected | |
375 | PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff | |
376 | EAL: PCI device 0000:06:00.0 on NUMA socket 0 | |
377 | EAL: probe driver: 15b3:1013 librte_pmd_mlx5 | |
378 | PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false) | |
379 | PMD: librte_pmd_mlx5: 1 port(s) detected | |
380 | PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa | |
381 | EAL: PCI device 0000:06:00.1 on NUMA socket 0 | |
382 | EAL: probe driver: 15b3:1013 librte_pmd_mlx5 | |
383 | PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false) | |
384 | PMD: librte_pmd_mlx5: 1 port(s) detected | |
385 | PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb | |
386 | Interactive-mode selected | |
387 | Configuring Port 0 (socket 0) | |
388 | PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2 | |
389 | PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2 | |
390 | Port 0: E4:1D:2D:E7:0C:FE | |
391 | Configuring Port 1 (socket 0) | |
392 | PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2 | |
393 | PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2 | |
394 | Port 1: E4:1D:2D:E7:0C:FF | |
395 | Configuring Port 2 (socket 0) | |
396 | PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2 | |
397 | PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2 | |
398 | Port 2: E4:1D:2D:E7:0C:FA | |
399 | Configuring Port 3 (socket 0) | |
400 | PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2 | |
401 | PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2 | |
402 | Port 3: E4:1D:2D:E7:0C:FB | |
403 | Checking link statuses... | |
404 | Port 0 Link Up - speed 40000 Mbps - full-duplex | |
405 | Port 1 Link Up - speed 40000 Mbps - full-duplex | |
406 | Port 2 Link Up - speed 10000 Mbps - full-duplex | |
407 | Port 3 Link Up - speed 10000 Mbps - full-duplex | |
408 | Done | |
409 | testpmd> |