1 =========================
2 Ceph messenger DPDKStack
3 =========================
8 Ceph dpdkstack is not compiled by default. Therefore, you need to recompile and
9 enable the DPDKstack component.
10 Optionally install ``dpdk-devel`` or ``dpdk-dev`` on distros with precompiled DPDK packages, and compile
14 do_cmake.sh -DWITH_DPDK=ON
17 Setting the DPDK Network Adapter
18 ================================
20 Most mainstream NICs support SR-IOV and can be virtualized into multiple VF NICs.
21 Each OSD uses some dedicated NICs through DPDK. The mon, mgr and client use the PF NICs
22 through the POSIX protocol stack.
24 Load the driver on which DPDK depends:
31 Configure Hugepage by editing ``/etc/sysctl.conf`` ::
35 Configure the number of VFs based on the number of OSDs:
39 echo $numvfs > /sys/class/net/$port/device/sriov_numvfs
41 Binding NICs to DPDK Applications:
45 dpdk-devbind.py -b vfio-pci 0000:xx:yy.z
48 Configuring OSD DPDKStack
49 ==========================
51 By default, the DPDK RTE initialization process requires the root privileges
52 for accessing various resources in system. To grant the root access to
59 The OSD selects the NICs using ``ms_dpdk_devs_allowlist``:
61 #. Configure a single NIC.
65 ms_dpdk_devs_allowlist=-a 0000:7d:010
71 ms_dpdk_devs_allowlist=--allow=0000:7d:010
73 #. Configure the Bond Network Adapter
77 ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6 --vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6
79 DPDK-related configuration items are as follows:
88 ms_dpdk_gateway_ipv4_addr=172.19.36.1
89 ms_dpdk_netmask_ipv4_addr=255.255.255.0
90 ms_dpdk_hugepages=/dev/hugepages
91 ms_dpdk_hw_flow_control=false
93 ms_dpdk_enable_tso=false
94 ms_dpdk_hw_queue_weight=1
95 ms_dpdk_memory_channel=2
96 ms_dpdk_debug_allow_loopback = true
100 ms_dpdk_host_ipv4_addr=172.19.36.51
101 public_addr=172.19.36.51
102 cluster_addr=172.19.36.51
103 ms_dpdk_devs_allowlist=--allow=0000:7d:01.1
105 Debug and Optimization
106 ======================
108 Locate faults based on logs and adjust logs to a proper level:
115 if the log contains a large number of retransmit messages,reduce the value of ms_dpdk_tcp_wmem.
117 Run the perf dump command to view DPDKStack statistics:
121 ceph daemon osd.$i perf dump | grep dpdk
124 if the ``dpdk_device_receive_nombuf_errors`` keeps increasing, check whether the
125 throttling exceeds the limit:
129 ceph daemon osd.$i perf dump | grep throttle-osd_client -A 7 | grep "get_or_fail_fail"
130 ceph daemon osd.$i perf dump | grep throttle-msgr_dispatch_throttler -A 7 | grep "get_or_fail_fail"
132 if the throttling exceeds the threshold, increase the throttling threshold or
133 disable the throttling.
135 Check whether the network adapter is faulty or abnormal.Run the following
136 command to obtain the network adapter status and statistics:
140 ceph daemon osd.$i show_pmd_stats
141 ceph daemon osd.$i show_pmd_xstats
143 Some DPDK versions (eg. dpdk-20.11-3.e18.aarch64) or NIC TSOs are abnormal,
148 ms_dpdk_enable_tso=false
150 if VF NICs support multiple queues, more NIC queues can be allocated to a
151 single core to improve performance:
155 ms_dpdk_hw_queues_per_qp=4
158 Status and Future Work
159 ======================
161 Compared with POSIX Stack, in the multi-concurrency test, DPDKStack has the same
162 4K random write performance, 8K random write performance is improved by 28%, and
163 1 MB packets are unstable. In the single-latency test,the 4K and 8K random write
164 latency is reduced by 15% (the lower the latency is, the better).
166 At a high level, our future work plan is:
168 OSD multiple network support (public network and cluster network)
169 The public and cluster network adapters can be configured.When connecting or
170 listening,the public or cluster network adapters can be selected based on the
171 IP address.During msgr-work initialization,initialize both the public and cluster
172 network adapters and create two DPDKQueuePairs.