]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/dpdk.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / dev / dpdk.rst
1 =========================
2 Ceph messenger DPDKStack
3 =========================
4
5 Compiling DPDKStack
6 ===================
7
8 Ceph dpdkstack is not compiled by default. Therefore, you need to recompile and
9 enable the DPDKstack component.
10 Optionally install ``dpdk-devel`` or ``dpdk-dev`` on distros with precompiled DPDK packages, and compile
11
12 .. prompt:: bash $
13
14 do_cmake.sh -DWITH_DPDK=ON
15
16
17 Setting the DPDK Network Adapter
18 ================================
19
20 Most mainstream NICs support SR-IOV and can be virtualized into multiple VF NICs.
21 Each OSD uses some dedicated NICs through DPDK. The mon, mgr and client use the PF NICs
22 through the POSIX protocol stack.
23
24 Load the driver on which DPDK depends:
25
26 .. prompt:: bash #
27
28 modprobe vfio
29 modprobe vfio_pci
30
31 Configure Hugepage by editing ``/etc/sysctl.conf`` ::
32
33 vm.nr_hugepages = xxx
34
35 Configure the number of VFs based on the number of OSDs:
36
37 .. prompt:: bash #
38
39 echo $numvfs > /sys/class/net/$port/device/sriov_numvfs
40
41 Binding NICs to DPDK Applications:
42
43 .. prompt:: bash #
44
45 dpdk-devbind.py -b vfio-pci 0000:xx:yy.z
46
47
48 Configuring OSD DPDKStack
49 ==========================
50
51 By default, the DPDK RTE initialization process requires the root privileges
52 for accessing various resources in system. To grant the root access to
53 the ``ceph`` user:
54
55 .. prompt:: bash #
56
57 usermod -G root ceph
58
59 The OSD selects the NICs using ``ms_dpdk_devs_allowlist``:
60
61 #. Configure a single NIC.
62
63 .. code-block:: ini
64
65 ms_dpdk_devs_allowlist=-a 0000:7d:010
66
67 or
68
69 .. code-block:: ini
70
71 ms_dpdk_devs_allowlist=--allow=0000:7d:010
72
73 #. Configure the Bond Network Adapter
74
75 .. code-block:: ini
76
77 ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6 --vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6
78
79 DPDK-related configuration items are as follows:
80
81 .. code-block:: ini
82
83 [osd]
84 ms_type=async+dpdk
85 ms_async_op_threads=1
86
87 ms_dpdk_port_id=0
88 ms_dpdk_gateway_ipv4_addr=172.19.36.1
89 ms_dpdk_netmask_ipv4_addr=255.255.255.0
90 ms_dpdk_hugepages=/dev/hugepages
91 ms_dpdk_hw_flow_control=false
92 ms_dpdk_lro=false
93 ms_dpdk_enable_tso=false
94 ms_dpdk_hw_queue_weight=1
95 ms_dpdk_memory_channel=2
96 ms_dpdk_debug_allow_loopback = true
97
98 [osd.x]
99 ms_dpdk_coremask=0xf0
100 ms_dpdk_host_ipv4_addr=172.19.36.51
101 public_addr=172.19.36.51
102 cluster_addr=172.19.36.51
103 ms_dpdk_devs_allowlist=--allow=0000:7d:01.1
104
105 Debug and Optimization
106 ======================
107
108 Locate faults based on logs and adjust logs to a proper level:
109
110 .. code-block:: ini
111
112 debug_dpdk=xx
113 debug_ms=xx
114
115 if the log contains a large number of retransmit messages,reduce the value of ms_dpdk_tcp_wmem.
116
117 Run the perf dump command to view DPDKStack statistics:
118
119 .. prompt:: bash $
120
121 ceph daemon osd.$i perf dump | grep dpdk
122
123
124 if the ``dpdk_device_receive_nombuf_errors`` keeps increasing, check whether the
125 throttling exceeds the limit:
126
127 .. prompt:: bash $
128
129 ceph daemon osd.$i perf dump | grep throttle-osd_client -A 7 | grep "get_or_fail_fail"
130 ceph daemon osd.$i perf dump | grep throttle-msgr_dispatch_throttler -A 7 | grep "get_or_fail_fail"
131
132 if the throttling exceeds the threshold, increase the throttling threshold or
133 disable the throttling.
134
135 Check whether the network adapter is faulty or abnormal.Run the following
136 command to obtain the network adapter status and statistics:
137
138 .. prompt:: bash $
139
140 ceph daemon osd.$i show_pmd_stats
141 ceph daemon osd.$i show_pmd_xstats
142
143 Some DPDK versions (eg. dpdk-20.11-3.e18.aarch64) or NIC TSOs are abnormal,
144 try disabling tso:
145
146 .. code-block:: ini
147
148 ms_dpdk_enable_tso=false
149
150 if VF NICs support multiple queues, more NIC queues can be allocated to a
151 single core to improve performance:
152
153 .. code-block:: ini
154
155 ms_dpdk_hw_queues_per_qp=4
156
157
158 Status and Future Work
159 ======================
160
161 Compared with POSIX Stack, in the multi-concurrency test, DPDKStack has the same
162 4K random write performance, 8K random write performance is improved by 28%, and
163 1 MB packets are unstable. In the single-latency test,the 4K and 8K random write
164 latency is reduced by 15% (the lower the latency is, the better).
165
166 At a high level, our future work plan is:
167
168 OSD multiple network support (public network and cluster network)
169 The public and cluster network adapters can be configured.When connecting or
170 listening,the public or cluster network adapters can be selected based on the
171 IP address.During msgr-work initialization,initialize both the public and cluster
172 network adapters and create two DPDKQueuePairs.