]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. BSD LICENSE |
2 | Copyright 2012-2015 6WIND S.A. | |
3 | ||
4 | Redistribution and use in source and binary forms, with or without | |
5 | modification, are permitted provided that the following conditions | |
6 | are met: | |
7 | ||
8 | * Redistributions of source code must retain the above copyright | |
9 | notice, this list of conditions and the following disclaimer. | |
10 | * Redistributions in binary form must reproduce the above copyright | |
11 | notice, this list of conditions and the following disclaimer in | |
12 | the documentation and/or other materials provided with the | |
13 | distribution. | |
14 | * Neither the name of 6WIND S.A. nor the names of its | |
15 | contributors may be used to endorse or promote products derived | |
16 | from this software without specific prior written permission. | |
17 | ||
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
19 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
20 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
21 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | |
22 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | |
23 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | |
24 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
25 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | |
26 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
27 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
29 | ||
30 | MLX4 poll mode driver library | |
31 | ============================= | |
32 | ||
33 | The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support | |
34 | for **Mellanox ConnectX-3** and **Mellanox ConnectX-3 Pro** 10/40 Gbps adapters | |
35 | as well as their virtual functions (VF) in SR-IOV context. | |
36 | ||
37 | Information and documentation about this family of adapters can be found on | |
38 | the `Mellanox website <http://www.mellanox.com>`_. Help is also provided by | |
39 | the `Mellanox community <http://community.mellanox.com/welcome>`_. | |
40 | ||
41 | There is also a `section dedicated to this poll mode driver | |
42 | <http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`_. | |
43 | ||
44 | .. note:: | |
45 | ||
46 | Due to external dependencies, this driver is disabled by default. It must | |
47 | be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and | |
48 | recompiling DPDK. | |
49 | ||
50 | Implementation details | |
51 | ---------------------- | |
52 | ||
53 | Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI | |
54 | bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a | |
55 | PCI driver that allocates one Ethernet device per detected port. | |
56 | ||
57 | For this reason, one cannot white/blacklist a single port without also | |
58 | white/blacklisting the others on the same device. | |
59 | ||
60 | Besides its dependency on libibverbs (that implies libmlx4 and associated | |
61 | kernel support), librte_pmd_mlx4 relies heavily on system calls for control | |
62 | operations such as querying/updating the MTU and flow control parameters. | |
63 | ||
64 | For security reasons and robustness, this driver only deals with virtual | |
65 | memory addresses. The way resources allocations are handled by the kernel | |
66 | combined with hardware specifications that allow it to handle virtual memory | |
67 | addresses directly ensure that DPDK applications cannot access random | |
68 | physical memory (or memory that does not belong to the current process). | |
69 | ||
70 | This capability allows the PMD to coexist with kernel network interfaces | |
71 | which remain functional, although they stop receiving unicast packets as | |
72 | long as they share the same MAC address. | |
73 | ||
74 | Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs. | |
75 | ||
76 | Features | |
77 | -------- | |
78 | ||
79 | - RSS, also known as RCA, is supported. In this mode the number of | |
80 | configured RX queues must be a power of two. | |
81 | - VLAN filtering is supported. | |
82 | - Link state information is provided. | |
83 | - Promiscuous mode is supported. | |
84 | - All multicast mode is supported. | |
85 | - Multiple MAC addresses (unicast, multicast) can be configured. | |
86 | - Scattered packets are supported for TX and RX. | |
87 | - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation. | |
88 | - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames. | |
89 | - Secondary process TX is supported. | |
90 | ||
91 | Limitations | |
92 | ----------- | |
93 | ||
94 | - RSS hash key cannot be modified. | |
95 | - RSS RETA cannot be configured | |
96 | - RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be | |
97 | dissociated. | |
98 | - Hardware counters are not implemented (they are software counters). | |
99 | - Secondary process RX is not supported. | |
100 | ||
101 | Configuration | |
102 | ------------- | |
103 | ||
104 | Compilation options | |
105 | ~~~~~~~~~~~~~~~~~~~ | |
106 | ||
107 | These options can be modified in the ``.config`` file. | |
108 | ||
109 | - ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**) | |
110 | ||
111 | Toggle compilation of librte_pmd_mlx4 itself. | |
112 | ||
113 | - ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**) | |
114 | ||
115 | Toggle debugging code and stricter compilation flags. Enabling this option | |
116 | adds additional run-time checks and debugging messages at the cost of | |
117 | lower performance. | |
118 | ||
119 | - ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**) | |
120 | ||
121 | Number of scatter/gather elements (SGEs) per work request (WR). Lowering | |
122 | this number improves performance but also limits the ability to receive | |
123 | scattered packets (packets that do not fit a single mbuf). The default | |
124 | value is a safe tradeoff. | |
125 | ||
126 | - ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**) | |
127 | ||
128 | Amount of data to be inlined during TX operations. Improves latency but | |
129 | lowers throughput. | |
130 | ||
131 | - ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**) | |
132 | ||
133 | Maximum number of cached memory pools (MPs) per TX queue. Each MP from | |
134 | which buffers are to be transmitted must be associated to memory regions | |
135 | (MRs). This is a slow operation that must be cached. | |
136 | ||
137 | This value is always 1 for RX queues since they use a single MP. | |
138 | ||
139 | - ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**) | |
140 | ||
141 | Toggle software counters. No counters are available if this option is | |
142 | disabled since hardware counters are not supported. | |
143 | ||
144 | Environment variables | |
145 | ~~~~~~~~~~~~~~~~~~~~~ | |
146 | ||
147 | - ``MLX4_INLINE_RECV_SIZE`` | |
148 | ||
149 | A nonzero value enables inline receive for packets up to that size. May | |
150 | significantly improve performance in some cases but lower it in | |
151 | others. Requires careful testing. | |
152 | ||
153 | Run-time configuration | |
154 | ~~~~~~~~~~~~~~~~~~~~~~ | |
155 | ||
156 | - The only constraint when RSS mode is requested is to make sure the number | |
157 | of RX queues is a power of two. This is a hardware requirement. | |
158 | ||
159 | - librte_pmd_mlx4 brings kernel network interfaces up during initialization | |
160 | because it is affected by their state. Forcing them down prevents packets | |
161 | reception. | |
162 | ||
163 | - **ethtool** operations on related kernel interfaces also affect the PMD. | |
164 | ||
165 | Kernel module parameters | |
166 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
167 | ||
168 | The **mlx4_core** kernel module has several parameters that affect the | |
169 | behavior and/or the performance of librte_pmd_mlx4. Some of them are described | |
170 | below. | |
171 | ||
172 | - **num_vfs** (integer or triplet, optionally prefixed by device address | |
173 | strings) | |
174 | ||
175 | Create the given number of VFs on the specified devices. | |
176 | ||
177 | - **log_num_mgm_entry_size** (integer) | |
178 | ||
179 | Device-managed flow steering (DMFS) is required by DPDK applications. It is | |
180 | enabled by using a negative value, the last four bits of which have a | |
181 | special meaning. | |
182 | ||
183 | - **-1**: force device-managed flow steering (DMFS). | |
184 | - **-7**: configure optimized steering mode to improve performance with the | |
185 | following limitation: VLAN filtering is not supported with this mode. | |
186 | This is the recommended mode in case VLAN filter is not needed. | |
187 | ||
188 | Prerequisites | |
189 | ------------- | |
190 | ||
191 | This driver relies on external libraries and kernel drivers for resources | |
192 | allocations and initialization. The following dependencies are not part of | |
193 | DPDK and must be installed separately: | |
194 | ||
195 | - **libibverbs** | |
196 | ||
197 | User space verbs framework used by librte_pmd_mlx4. This library provides | |
198 | a generic interface between the kernel and low-level user space drivers | |
199 | such as libmlx4. | |
200 | ||
201 | It allows slow and privileged operations (context initialization, hardware | |
202 | resources allocations) to be managed by the kernel and fast operations to | |
203 | never leave user space. | |
204 | ||
205 | - **libmlx4** | |
206 | ||
207 | Low-level user space driver library for Mellanox ConnectX-3 devices, | |
208 | it is automatically loaded by libibverbs. | |
209 | ||
210 | This library basically implements send/receive calls to the hardware | |
211 | queues. | |
212 | ||
213 | - **Kernel modules** (mlnx-ofed-kernel) | |
214 | ||
215 | They provide the kernel-side verbs API and low level device drivers that | |
216 | manage actual hardware initialization and resources sharing with user | |
217 | space processes. | |
218 | ||
219 | Unlike most other PMDs, these modules must remain loaded and bound to | |
220 | their devices: | |
221 | ||
222 | - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices. | |
223 | - mlx4_en: Ethernet device driver that provides kernel network interfaces. | |
224 | - mlx4_ib: InifiniBand device driver. | |
225 | - ib_uverbs: user space driver for verbs (entry point for libibverbs). | |
226 | ||
227 | - **Firmware update** | |
228 | ||
229 | Mellanox OFED releases include firmware updates for ConnectX-3 adapters. | |
230 | ||
231 | Because each release provides new features, these updates must be applied to | |
232 | match the kernel modules and libraries they come with. | |
233 | ||
234 | .. note:: | |
235 | ||
236 | Both libraries are BSD and GPL licensed. Linux kernel modules are GPL | |
237 | licensed. | |
238 | ||
239 | Currently supported by DPDK: | |
240 | ||
241 | - Mellanox OFED **3.1**. | |
242 | - Firmware version **2.35.5100** and higher. | |
243 | - Supported architectures: **x86_64** and **POWER8**. | |
244 | ||
245 | Getting Mellanox OFED | |
246 | ~~~~~~~~~~~~~~~~~~~~~ | |
247 | ||
248 | While these libraries and kernel modules are available on OpenFabrics | |
249 | Alliance's `website <https://www.openfabrics.org/>`_ and provided by package | |
250 | managers on most distributions, this PMD requires Ethernet extensions that | |
251 | may not be supported at the moment (this is a work in progress). | |
252 | ||
253 | `Mellanox OFED | |
254 | <http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers>`_ | |
255 | includes the necessary support and should be used in the meantime. For DPDK, | |
256 | only libibverbs, libmlx4, mlnx-ofed-kernel packages and firmware updates are | |
257 | required from that distribution. | |
258 | ||
259 | .. note:: | |
260 | ||
261 | Several versions of Mellanox OFED are available. Installing the version | |
262 | this DPDK release was developed and tested against is strongly | |
263 | recommended. Please check the `prerequisites`_. | |
264 | ||
265 | Usage example | |
266 | ------------- | |
267 | ||
268 | This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3 | |
269 | devices managed by librte_pmd_mlx4. | |
270 | ||
271 | #. Load the kernel modules: | |
272 | ||
273 | .. code-block:: console | |
274 | ||
275 | modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib | |
276 | ||
277 | Alternatively if MLNX_OFED is fully installed, the following script can | |
278 | be run: | |
279 | ||
280 | .. code-block:: console | |
281 | ||
282 | /etc/init.d/openibd restart | |
283 | ||
284 | .. note:: | |
285 | ||
286 | User space I/O kernel modules (uio and igb_uio) are not used and do | |
287 | not have to be loaded. | |
288 | ||
289 | #. Make sure Ethernet interfaces are in working order and linked to kernel | |
290 | verbs. Related sysfs entries should be present: | |
291 | ||
292 | .. code-block:: console | |
293 | ||
294 | ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 | |
295 | ||
296 | Example output: | |
297 | ||
298 | .. code-block:: console | |
299 | ||
300 | eth2 | |
301 | eth3 | |
302 | eth4 | |
303 | eth5 | |
304 | ||
305 | #. Optionally, retrieve their PCI bus addresses for whitelisting: | |
306 | ||
307 | .. code-block:: console | |
308 | ||
309 | { | |
310 | for intf in eth2 eth3 eth4 eth5; | |
311 | do | |
312 | (cd "/sys/class/net/${intf}/device/" && pwd -P); | |
313 | done; | |
314 | } | | |
315 | sed -n 's,.*/\(.*\),-w \1,p' | |
316 | ||
317 | Example output: | |
318 | ||
319 | .. code-block:: console | |
320 | ||
321 | -w 0000:83:00.0 | |
322 | -w 0000:83:00.0 | |
323 | -w 0000:84:00.0 | |
324 | -w 0000:84:00.0 | |
325 | ||
326 | .. note:: | |
327 | ||
328 | There are only two distinct PCI bus addresses because the Mellanox | |
329 | ConnectX-3 adapters installed on this system are dual port. | |
330 | ||
331 | #. Request huge pages: | |
332 | ||
333 | .. code-block:: console | |
334 | ||
335 | echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages | |
336 | ||
337 | #. Start testpmd with basic parameters: | |
338 | ||
339 | .. code-block:: console | |
340 | ||
341 | testpmd -c 0xff00 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i | |
342 | ||
343 | Example output: | |
344 | ||
345 | .. code-block:: console | |
346 | ||
347 | [...] | |
348 | EAL: PCI device 0000:83:00.0 on NUMA socket 1 | |
349 | EAL: probe driver: 15b3:1007 librte_pmd_mlx4 | |
350 | PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) | |
351 | PMD: librte_pmd_mlx4: 2 port(s) detected | |
352 | PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50 | |
353 | PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51 | |
354 | EAL: PCI device 0000:84:00.0 on NUMA socket 1 | |
355 | EAL: probe driver: 15b3:1007 librte_pmd_mlx4 | |
356 | PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false) | |
357 | PMD: librte_pmd_mlx4: 2 port(s) detected | |
358 | PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0 | |
359 | PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1 | |
360 | Interactive-mode selected | |
361 | Configuring Port 0 (socket 0) | |
362 | PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2 | |
363 | PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2 | |
364 | Port 0: 00:02:C9:B5:B7:50 | |
365 | Configuring Port 1 (socket 0) | |
366 | PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2 | |
367 | PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2 | |
368 | Port 1: 00:02:C9:B5:B7:51 | |
369 | Configuring Port 2 (socket 0) | |
370 | PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2 | |
371 | PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2 | |
372 | Port 2: 00:02:C9:B5:BA:B0 | |
373 | Configuring Port 3 (socket 0) | |
374 | PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2 | |
375 | PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2 | |
376 | Port 3: 00:02:C9:B5:BA:B1 | |
377 | Checking link statuses... | |
378 | Port 0 Link Up - speed 10000 Mbps - full-duplex | |
379 | Port 1 Link Up - speed 40000 Mbps - full-duplex | |
380 | Port 2 Link Up - speed 10000 Mbps - full-duplex | |
381 | Port 3 Link Up - speed 40000 Mbps - full-duplex | |
382 | Done | |
383 | testpmd> |