]> git.proxmox.com Git - ceph.git/blob - ceph/src/spdk/dpdk/doc/guides/linux_gsg/nic_perf_intel_platform.rst
import 15.2.0 Octopus source
[ceph.git] / ceph / src / spdk / dpdk / doc / guides / linux_gsg / nic_perf_intel_platform.rst
1 How to get best performance with NICs on Intel platforms
2 ========================================================
3
4 This document is a step-by-step guide for getting high performance from DPDK applications on Intel platforms.
5
6
7 Hardware and Memory Requirements
8 --------------------------------
9
10 For best performance use an Intel Xeon class server system such as Ivy Bridge, Haswell or newer.
11
12 Ensure that each memory channel has at least one memory DIMM inserted, and that the memory size for each is at least 4GB.
13 **Note**: this has one of the most direct effects on performance.
14
15 You can check the memory configuration using ``dmidecode`` as follows::
16
17 dmidecode -t memory | grep Locator
18
19 Locator: DIMM_A1
20 Bank Locator: NODE 1
21 Locator: DIMM_A2
22 Bank Locator: NODE 1
23 Locator: DIMM_B1
24 Bank Locator: NODE 1
25 Locator: DIMM_B2
26 Bank Locator: NODE 1
27 ...
28 Locator: DIMM_G1
29 Bank Locator: NODE 2
30 Locator: DIMM_G2
31 Bank Locator: NODE 2
32 Locator: DIMM_H1
33 Bank Locator: NODE 2
34 Locator: DIMM_H2
35 Bank Locator: NODE 2
36
37 The sample output above shows a total of 8 channels, from ``A`` to ``H``, where each channel has 2 DIMMs.
38
39 You can also use ``dmidecode`` to determine the memory frequency::
40
41 dmidecode -t memory | grep Speed
42
43 Speed: 2133 MHz
44 Configured Clock Speed: 2134 MHz
45 Speed: Unknown
46 Configured Clock Speed: Unknown
47 Speed: 2133 MHz
48 Configured Clock Speed: 2134 MHz
49 Speed: Unknown
50 ...
51 Speed: 2133 MHz
52 Configured Clock Speed: 2134 MHz
53 Speed: Unknown
54 Configured Clock Speed: Unknown
55 Speed: 2133 MHz
56 Configured Clock Speed: 2134 MHz
57 Speed: Unknown
58 Configured Clock Speed: Unknown
59
60 The output shows a speed of 2133 MHz (DDR4) and Unknown (not existing).
61 This aligns with the previous output which showed that each channel has one memory bar.
62
63
64 Network Interface Card Requirements
65 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
66
67 Use a `DPDK supported <http://core.dpdk.org/supported/>`_ high end NIC such as the Intel XL710 40GbE.
68
69 Make sure each NIC has been flashed the latest version of NVM/firmware.
70
71 Use PCIe Gen3 slots, such as Gen3 ``x8`` or Gen3 ``x16`` because PCIe Gen2 slots don't provide enough bandwidth
72 for 2 x 10GbE and above.
73 You can use ``lspci`` to check the speed of a PCI slot using something like the following::
74
75 lspci -s 03:00.1 -vv | grep LnkSta
76
77 LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- ...
78 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ ...
79
80 When inserting NICs into PCI slots always check the caption, such as CPU0 or CPU1 to indicate which socket it is connected to.
81
82 Care should be take with NUMA.
83 If you are using 2 or more ports from different NICs, it is best to ensure that these NICs are on the same CPU socket.
84 An example of how to determine this is shown further below.
85
86
87 BIOS Settings
88 ~~~~~~~~~~~~~
89
90 The following are some recommendations on BIOS settings. Different platforms will have different BIOS naming
91 so the following is mainly for reference:
92
93 #. Establish the steady state for the system, consider reviewing BIOS settings desired for best performance characteristic e.g. optimize for performance or energy efficiency.
94
95 #. Match the BIOS settings to the needs of the application you are testing.
96
97 #. Typically, **Performance** as the CPU Power and Performance policy is a reasonable starting point.
98
99 #. Consider using Turbo Boost to increase the frequency on cores.
100
101 #. Disable all virtualization options when you test the physical function of the NIC, and turn on VT-d if you wants to use VFIO.
102
103
104 Linux boot command line
105 ~~~~~~~~~~~~~~~~~~~~~~~
106
107 The following are some recommendations on GRUB boot settings:
108
109 #. Use the default grub file as a starting point.
110
111 #. Reserve 1G huge pages via grub configurations. For example to reserve 8 huge pages of 1G size::
112
113 default_hugepagesz=1G hugepagesz=1G hugepages=8
114
115 #. Isolate CPU cores which will be used for DPDK. For example::
116
117 isolcpus=2,3,4,5,6,7,8
118
119 #. If it wants to use VFIO, use the following additional grub parameters::
120
121 iommu=pt intel_iommu=on
122
123
124 Configurations before running DPDK
125 ----------------------------------
126
127 1. Build the DPDK target and reserve huge pages.
128 See the earlier section on :ref:`linux_gsg_hugepages` for more details.
129
130 The following shell commands may help with building and configuration:
131
132 .. code-block:: console
133
134 # Build DPDK target.
135 cd dpdk_folder
136 make install T=x86_64-native-linux-gcc -j
137
138 # Get the hugepage size.
139 awk '/Hugepagesize/ {print $2}' /proc/meminfo
140
141 # Get the total huge page numbers.
142 awk '/HugePages_Total/ {print $2} ' /proc/meminfo
143
144 # Unmount the hugepages.
145 umount `awk '/hugetlbfs/ {print $2}' /proc/mounts`
146
147 # Create the hugepage mount folder.
148 mkdir -p /mnt/huge
149
150 # Mount to the specific folder.
151 mount -t hugetlbfs nodev /mnt/huge
152
153 2. Check the CPU layout using using the DPDK ``cpu_layout`` utility:
154
155 .. code-block:: console
156
157 cd dpdk_folder
158
159 usertools/cpu_layout.py
160
161 Or run ``lscpu`` to check the cores on each socket.
162
163 3. Check your NIC id and related socket id:
164
165 .. code-block:: console
166
167 # List all the NICs with PCI address and device IDs.
168 lspci -nn | grep Eth
169
170 For example suppose your output was as follows::
171
172 82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
173 82:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
174 85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
175 85:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
176
177 Check the PCI device related numa node id:
178
179 .. code-block:: console
180
181 cat /sys/bus/pci/devices/0000\:xx\:00.x/numa_node
182
183 Usually ``0x:00.x`` is on socket 0 and ``8x:00.x`` is on socket 1.
184 **Note**: To get the best performance, ensure that the core and NICs are in the same socket.
185 In the example above ``85:00.0`` is on socket 1 and should be used by cores on socket 1 for the best performance.
186
187 4. Check which kernel drivers needs to be loaded and whether there is a need to unbind the network ports from their kernel drivers.
188 More details about DPDK setup and Linux kernel requirements see :ref:`linux_gsg_compiling_dpdk` and :ref:`linux_gsg_linux_drivers`.