]>
Commit | Line | Data |
---|---|---|
8a9562d2 PS |
1 | Using Open vSwitch with DPDK |
2 | ============================ | |
3 | ||
4 | Open vSwitch can use Intel(R) DPDK lib to operate entirely in | |
5 | userspace. This file explains how to install and use Open vSwitch in | |
6 | such a mode. | |
7 | ||
8 | The DPDK support of Open vSwitch is considered experimental. | |
9 | It has not been thoroughly tested. | |
10 | ||
11 | This version of Open vSwitch should be built manually with "configure" | |
12 | and "make". | |
13 | ||
14 | Building and Installing: | |
15 | ------------------------ | |
16 | ||
d7310583 | 17 | Required DPDK 1.7. |
8a9562d2 PS |
18 | |
19 | DPDK: | |
d7310583 | 20 | Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.7.0 |
c2cbb53c | 21 | cd $DPDK_DIR |
d7310583 | 22 | update config/common_linuxapp so that dpdk generate single lib file. |
30f4d875 | 23 | (modification also required for IVSHMEM build) |
8a9562d2 PS |
24 | CONFIG_RTE_BUILD_COMBINE_LIBS=y |
25 | ||
30f4d875 | 26 | For default install without IVSHMEM: |
d7310583 | 27 | make install T=x86_64-native-linuxapp-gcc |
30f4d875 PS |
28 | To include IVSHMEM (shared memory): |
29 | make install T=x86_64-ivshmem-linuxapp-gcc | |
8a9562d2 PS |
30 | For details refer to http://dpdk.org/ |
31 | ||
32 | Linux kernel: | |
33 | Refer to intel-dpdk-getting-started-guide.pdf for understanding | |
34 | DPDK kernel requirement. | |
35 | ||
36 | OVS: | |
30f4d875 PS |
37 | Non IVSHMEM: |
38 | export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/ | |
39 | IVSHMEM: | |
40 | export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/ | |
41 | ||
8a9562d2 PS |
42 | cd $(OVS_DIR)/openvswitch |
43 | ./boot.sh | |
c2cbb53c | 44 | ./configure --with-dpdk=$DPDK_BUILD |
8a9562d2 PS |
45 | make |
46 | ||
47 | Refer to INSTALL.userspace for general requirements of building | |
48 | userspace OVS. | |
49 | ||
50 | Using the DPDK with ovs-vswitchd: | |
51 | --------------------------------- | |
52 | ||
c2cbb53c PM |
53 | Setup system boot: |
54 | kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1 | |
55 | ||
8a9562d2 PS |
56 | First setup DPDK devices: |
57 | - insert uio.ko | |
c2cbb53c | 58 | e.g. modprobe uio |
8a9562d2 | 59 | - insert igb_uio.ko |
d7310583 DDP |
60 | e.g. insmod $DPDK_BUILD/kmod/igb_uio.ko |
61 | - Bind network device to igb_uio. | |
62 | e.g. $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1 | |
c2cbb53c PM |
63 | Alternate binding method: |
64 | Find target Ethernet devices | |
65 | lspci -nn|grep Ethernet | |
66 | Bring Down (e.g. eth2, eth3) | |
67 | ifconfig eth2 down | |
68 | ifconfig eth3 down | |
69 | Look at current devices (e.g ixgbe devices) | |
70 | ls /sys/bus/pci/drivers/ixgbe/ | |
71 | 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind | |
72 | Unbind target pci devices from current driver (e.g. 02:00.0 ...) | |
73 | echo 0000:02:00.0 > /sys/bus/pci/drivers/ixgbe/unbind | |
74 | echo 0000:02:00.1 > /sys/bus/pci/drivers/ixgbe/unbind | |
75 | Bind to target driver (e.g. igb_uio) | |
76 | echo 0000:02:00.0 > /sys/bus/pci/drivers/igb_uio/bind | |
77 | echo 0000:02:00.1 > /sys/bus/pci/drivers/igb_uio/bind | |
78 | Check binding for listed devices | |
79 | ls /sys/bus/pci/drivers/igb_uio | |
80 | 0000:02:00.0 0000:02:00.1 bind module new_id remove_id uevent unbind | |
81 | ||
82 | Prepare system: | |
c2cbb53c | 83 | - mount hugetlbfs |
30f4d875 | 84 | e.g. mount -t hugetlbfs -o pagesize=1G none /dev/hugepages |
8a9562d2 PS |
85 | |
86 | Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. | |
87 | ||
c2cbb53c PM |
88 | Start ovsdb-server as discussed in INSTALL doc: |
89 | Summary e.g.: | |
90 | First time only db creation (or clearing): | |
91 | mkdir -p /usr/local/etc/openvswitch | |
92 | mkdir -p /usr/local/var/run/openvswitch | |
93 | rm /usr/local/etc/openvswitch/conf.db | |
94 | cd $OVS_DIR | |
95 | ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ | |
96 | ./vswitchd/vswitch.ovsschema | |
97 | start ovsdb-server | |
98 | cd $OVS_DIR | |
99 | ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ | |
6ba531aa | 100 | --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ |
c2cbb53c | 101 | --private-key=db:Open_vSwitch,SSL,private_key \ |
30f4d875 | 102 | --certificate=Open_vSwitch,SSL,certificate \ |
c2cbb53c PM |
103 | --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach |
104 | First time after db creation, initialize: | |
105 | cd $OVS_DIR | |
106 | ./utilities/ovs-vsctl --no-wait init | |
107 | ||
8a9562d2 PS |
108 | Start vswitchd: |
109 | DPDK configuration arguments can be passed to vswitchd via `--dpdk` | |
d1279464 PS |
110 | argument. This needs to be first argument passed to vswitchd process. |
111 | dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter | |
8a9562d2 PS |
112 | for dpdk initialization. |
113 | ||
114 | e.g. | |
c2cbb53c | 115 | export DB_SOCK=/usr/local/var/run/openvswitch/db.sock |
30f4d875 | 116 | ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach |
8a9562d2 | 117 | |
30f4d875 PS |
118 | If allocated more than one GB hugepage (as for IVSHMEM), set amount and use NUMA |
119 | node 0 memory: | |
c2cbb53c PM |
120 | |
121 | ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ | |
30f4d875 | 122 | -- unix:$DB_SOCK --pidfile --detach |
c2cbb53c | 123 | |
8a9562d2 PS |
124 | To use ovs-vswitchd with DPDK, create a bridge with datapath_type |
125 | "netdev" in the configuration database. For example: | |
126 | ||
127 | ovs-vsctl add-br br0 | |
128 | ovs-vsctl set bridge br0 datapath_type=netdev | |
129 | ||
130 | Now you can add dpdk devices. OVS expect DPDK device name start with dpdk | |
131 | and end with portid. vswitchd should print number of dpdk devices found. | |
132 | ||
133 | ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk | |
c2cbb53c | 134 | ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk |
8a9562d2 | 135 | |
c2cbb53c | 136 | Once first DPDK port is added to vswitchd, it creates a Polling thread and |
8a9562d2 PS |
137 | polls dpdk device in continuous loop. Therefore CPU utilization |
138 | for that thread is always 100%. | |
139 | ||
c2cbb53c PM |
140 | Test flow script across NICs (assuming ovs in /usr/src/ovs): |
141 | Assume 1.1.1.1 on NIC port 1 (dpdk0) | |
142 | Assume 1.1.1.2 on NIC port 2 (dpdk1) | |
143 | Execute script: | |
144 | ||
145 | ############################# Script: | |
146 | ||
147 | #! /bin/sh | |
c2cbb53c | 148 | # Move to command directory |
c2cbb53c PM |
149 | cd /usr/src/ovs/utilities/ |
150 | ||
151 | # Clear current flows | |
152 | ./ovs-ofctl del-flows br0 | |
153 | ||
154 | # Add flows between port 1 (dpdk0) to port 2 (dpdk1) | |
155 | ./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\ | |
156 | nw_dst=1.1.1.2,idle_timeout=0,action=output:2 | |
157 | ./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\ | |
158 | nw_dst=1.1.1.1,idle_timeout=0,action=output:1 | |
159 | ||
160 | ###################################### | |
161 | ||
ee8627fa AW |
162 | With pmd multi-threading support, OVS creates one pmd thread for each |
163 | numa node as default. The pmd thread handles the I/O of all DPDK | |
164 | interfaces on the same numa node. The following two commands can be used | |
165 | to configure the multi-threading behavior. | |
c2cbb53c | 166 | |
ee8627fa | 167 | ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string> |
c2cbb53c | 168 | |
ee8627fa AW |
169 | The command above asks for a CPU mask for setting the affinity of pmd threads. |
170 | A set bit in the mask means a pmd thread is created and pinned to the | |
171 | corresponding CPU core. For more information, please refer to | |
172 | `man ovs-vswitchd.conf.db` | |
c2cbb53c | 173 | |
ee8627fa | 174 | ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer> |
c2cbb53c | 175 | |
ee8627fa AW |
176 | The command above sets the number of rx queues of each DPDK interface. The |
177 | rx queues are assigned to pmd threads on the same numa node in round-robin | |
178 | fashion. For more information, please refer to `man ovs-vswitchd.conf.db` | |
c2cbb53c | 179 | |
ee8627fa AW |
180 | Ideally for maximum throughput, the pmd thread should not be scheduled out |
181 | which temporarily halts its execution. The following affinitization methods | |
182 | can help. | |
c2cbb53c | 183 | |
ee8627fa AW |
184 | Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core |
185 | sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 | |
186 | and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu | |
187 | configuration could have different core mask requirements). | |
c2cbb53c | 188 | |
ee8627fa AW |
189 | To kernel bootline add core isolation list for cores and associated hype cores |
190 | (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take | |
191 | effect, restart everything. | |
c2cbb53c | 192 | |
ee8627fa | 193 | Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': |
c2cbb53c | 194 | |
ee8627fa AW |
195 | ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550 |
196 | ||
197 | You should be able to check that pmd threads are pinned to the correct cores | |
198 | via: | |
199 | ||
200 | top -p `pidof ovs-vswitchd` -H -d1 | |
201 | ||
202 | Note, the pmd threads on a numa node are only created if there is at least | |
203 | one DPDK interface from the numa node that has been added to OVS. | |
204 | ||
205 | Note, core 0 is always reserved from non-pmd threads and should never be set | |
206 | in the cpu mask. | |
c2cbb53c | 207 | |
95fb793a | 208 | DPDK Rings : |
209 | ------------ | |
210 | ||
211 | Following the steps above to create a bridge, you can now add dpdk rings | |
212 | as a port to the vswitch. OVS will expect the DPDK ring device name to | |
213 | start with dpdkr and end with a portid. | |
214 | ||
215 | ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr | |
216 | ||
217 | DPDK rings client test application | |
218 | ||
219 | Included in the test directory is a sample DPDK application for testing | |
220 | the rings. This is from the base dpdk directory and modified to work | |
221 | with the ring naming used within ovs. | |
222 | ||
223 | location tests/ovs_client | |
224 | ||
225 | To run the client : | |
30f4d875 PS |
226 | cd /usr/src/ovs/tests/ |
227 | ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" | |
95fb793a | 228 | |
229 | In the case of the dpdkr example above the "port id you gave dpdkr" is 0. | |
230 | ||
231 | It is essential to have --proc-type=secondary | |
232 | ||
233 | The application simply receives an mbuf on the receive queue of the | |
234 | ethernet ring and then places that same mbuf on the transmit ring of | |
235 | the ethernet ring. It is a trivial loopback application. | |
236 | ||
30f4d875 PS |
237 | DPDK rings in VM (IVSHMEM shared memory communications) |
238 | ------------------------------------------------------- | |
239 | ||
95fb793a | 240 | In addition to executing the client in the host, you can execute it within |
241 | a guest VM. To do so you will need a patched qemu. You can download the | |
242 | patch and getting started guide at : | |
243 | ||
244 | https://01.org/packet-processing/downloads | |
245 | ||
246 | A general rule of thumb for better performance is that the client | |
247 | application should not be assigned the same dpdk core mask "-c" as | |
248 | the vswitchd. | |
249 | ||
8a9562d2 PS |
250 | Restrictions: |
251 | ------------- | |
252 | ||
253 | - This Support is for Physical NIC. I have tested with Intel NIC only. | |
8a9562d2 PS |
254 | - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. |
255 | - Currently DPDK port does not make use any offload functionality. | |
95fb793a | 256 | ivshmem |
257 | - The shared memory is currently restricted to the use of a 1GB | |
258 | huge pages. | |
259 | - All huge pages are shared amongst the host, clients, virtual | |
260 | machines etc. | |
8a9562d2 PS |
261 | |
262 | Bug Reporting: | |
263 | -------------- | |
264 | ||
265 | Please report problems to bugs@openvswitch.org. |