]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. BSD LICENSE |
2 | Copyright(c) 2010-2014 Intel Corporation. All rights reserved. | |
3 | All rights reserved. | |
4 | ||
5 | Redistribution and use in source and binary forms, with or without | |
6 | modification, are permitted provided that the following conditions | |
7 | are met: | |
8 | ||
9 | * Redistributions of source code must retain the above copyright | |
10 | notice, this list of conditions and the following disclaimer. | |
11 | * Redistributions in binary form must reproduce the above copyright | |
12 | notice, this list of conditions and the following disclaimer in | |
13 | the documentation and/or other materials provided with the | |
14 | distribution. | |
15 | * Neither the name of Intel Corporation nor the names of its | |
16 | contributors may be used to endorse or promote products derived | |
17 | from this software without specific prior written permission. | |
18 | ||
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
20 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
21 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
22 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | |
23 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | |
24 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | |
25 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
26 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | |
27 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
28 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
30 | ||
31 | Quota and Watermark Sample Application | |
32 | ====================================== | |
33 | ||
34 | The Quota and Watermark sample application is a simple example of packet processing using Data Plane Development Kit (DPDK) that | |
35 | showcases the use of a quota as the maximum number of packets enqueue/dequeue at a time and low and high watermarks | |
36 | to signal low and high ring usage respectively. | |
37 | ||
38 | Additionally, it shows how ring watermarks can be used to feedback congestion notifications to data producers by | |
39 | temporarily stopping processing overloaded rings and sending Ethernet flow control frames. | |
40 | ||
41 | This sample application is split in two parts: | |
42 | ||
43 | * qw - The core quota and watermark sample application | |
44 | ||
45 | * qwctl - A command line tool to alter quota and watermarks while qw is running | |
46 | ||
47 | Overview | |
48 | -------- | |
49 | ||
50 | The Quota and Watermark sample application performs forwarding for each packet that is received on a given port. | |
51 | The destination port is the adjacent port from the enabled port mask, that is, | |
52 | if the first four ports are enabled (port mask 0xf), ports 0 and 1 forward into each other, | |
53 | and ports 2 and 3 forward into each other. | |
54 | The MAC addresses of the forwarded Ethernet frames are not affected. | |
55 | ||
56 | Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline, | |
57 | each stage of which being connected by rings, as shown in :numref:`figure_pipeline_overview`. | |
58 | ||
59 | .. _figure_pipeline_overview: | |
60 | ||
61 | .. figure:: img/pipeline_overview.* | |
62 | ||
63 | Pipeline Overview | |
64 | ||
65 | ||
66 | An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue. | |
67 | Adjustable watermark values associated with the rings control a back-off mechanism that | |
68 | tries to prevent the pipeline from being overloaded by: | |
69 | ||
70 | * Stopping enqueuing on rings for which the usage has crossed the high watermark threshold | |
71 | ||
72 | * Sending Ethernet pause frames | |
73 | ||
74 | * Only resuming enqueuing on a ring once its usage goes below a global low watermark threshold | |
75 | ||
76 | This mechanism allows congestion notifications to go up the ring pipeline and | |
77 | eventually lead to an Ethernet flow control frame being send to the source. | |
78 | ||
79 | On top of serving as an example of quota and watermark usage, | |
80 | this application can be used to benchmark ring based processing pipelines performance using a traffic- generator, | |
81 | as shown in :numref:`figure_ring_pipeline_perf_setup`. | |
82 | ||
83 | .. _figure_ring_pipeline_perf_setup: | |
84 | ||
85 | .. figure:: img/ring_pipeline_perf_setup.* | |
86 | ||
87 | Ring-based Processing Pipeline Performance Setup | |
88 | ||
89 | ||
90 | Compiling the Application | |
91 | ------------------------- | |
92 | ||
93 | #. Go to the example directory: | |
94 | ||
95 | .. code-block:: console | |
96 | ||
97 | export RTE_SDK=/path/to/rte_sdk | |
98 | cd ${RTE_SDK}/examples/quota_watermark | |
99 | ||
100 | #. Set the target (a default target is used if not specified). For example: | |
101 | ||
102 | .. code-block:: console | |
103 | ||
104 | export RTE_TARGET=x86_64-native-linuxapp-gcc | |
105 | ||
106 | See the *DPDK Getting Started Guide* for possible RTE_TARGET values. | |
107 | ||
108 | #. Build the application: | |
109 | ||
110 | .. code-block:: console | |
111 | ||
112 | make | |
113 | ||
114 | Running the Application | |
115 | ----------------------- | |
116 | ||
117 | The core application, qw, has to be started first. | |
118 | ||
119 | Once it is up and running, one can alter quota and watermarks while it runs using the control application, qwctl. | |
120 | ||
121 | Running the Core Application | |
122 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
123 | ||
124 | The application requires a single command line option: | |
125 | ||
126 | .. code-block:: console | |
127 | ||
128 | ./qw/build/qw [EAL options] -- -p PORTMASK | |
129 | ||
130 | where, | |
131 | ||
132 | -p PORTMASK: A hexadecimal bitmask of the ports to configure | |
133 | ||
134 | To run the application in a linuxapp environment with four logical cores and ports 0 and 2, | |
135 | issue the following command: | |
136 | ||
137 | .. code-block:: console | |
138 | ||
139 | ./qw/build/qw -c f -n 4 -- -p 5 | |
140 | ||
141 | Refer to the *DPDK Getting Started Guide* for general information on running applications and | |
142 | the Environment Abstraction Layer (EAL) options. | |
143 | ||
144 | Running the Control Application | |
145 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
146 | ||
147 | The control application requires a number of command line options: | |
148 | ||
149 | .. code-block:: console | |
150 | ||
151 | ./qwctl/build/qwctl [EAL options] --proc-type=secondary | |
152 | ||
153 | The --proc-type=secondary option is necessary for the EAL to properly initialize the control application to | |
154 | use the same huge pages as the core application and thus be able to access its rings. | |
155 | ||
156 | To run the application in a linuxapp environment on logical core 0, issue the following command: | |
157 | ||
158 | .. code-block:: console | |
159 | ||
160 | ./qwctl/build/qwctl -c 1 -n 4 --proc-type=secondary | |
161 | ||
162 | Refer to the *DPDK Getting Started* Guide for general information on running applications and | |
163 | the Environment Abstraction Layer (EAL) options. | |
164 | ||
165 | qwctl is an interactive command line that let the user change variables in a running instance of qw. | |
166 | The help command gives a list of available commands: | |
167 | ||
168 | .. code-block:: console | |
169 | ||
170 | $ qwctl > help | |
171 | ||
172 | Code Overview | |
173 | ------------- | |
174 | ||
175 | The following sections provide a quick guide to the application's source code. | |
176 | ||
177 | Core Application - qw | |
178 | ~~~~~~~~~~~~~~~~~~~~~ | |
179 | ||
180 | EAL and Drivers Setup | |
181 | ^^^^^^^^^^^^^^^^^^^^^ | |
182 | ||
183 | The EAL arguments are parsed at the beginning of the main() function: | |
184 | ||
185 | .. code-block:: c | |
186 | ||
187 | ret = rte_eal_init(argc, argv); | |
188 | if (ret < 0) | |
189 | rte_exit(EXIT_FAILURE, "Cannot initialize EAL\n"); | |
190 | ||
191 | argc -= ret; | |
192 | argv += ret; | |
193 | ||
194 | Then, a call to init_dpdk(), defined in init.c, is made to initialize the poll mode drivers: | |
195 | ||
196 | .. code-block:: c | |
197 | ||
198 | void | |
199 | init_dpdk(void) | |
200 | { | |
201 | int ret; | |
202 | ||
203 | /* Bind the drivers to usable devices */ | |
204 | ||
205 | ret = rte_eal_pci_probe(); | |
206 | if (ret < 0) | |
207 | rte_exit(EXIT_FAILURE, "rte_eal_pci_probe(): error %d\n", ret); | |
208 | ||
209 | if (rte_eth_dev_count() < 2) | |
210 | rte_exit(EXIT_FAILURE, "Not enough Ethernet port available\n"); | |
211 | } | |
212 | ||
213 | To fully understand this code, it is recommended to study the chapters that relate to the *Poll Mode Driver* | |
214 | in the *DPDK Getting Started Guide* and the *DPDK API Reference*. | |
215 | ||
216 | Shared Variables Setup | |
217 | ^^^^^^^^^^^^^^^^^^^^^^ | |
218 | ||
219 | The quota and low_watermark shared variables are put into an rte_memzone using a call to setup_shared_variables(): | |
220 | ||
221 | .. code-block:: c | |
222 | ||
223 | void | |
224 | setup_shared_variables(void) | |
225 | { | |
226 | const struct rte_memzone *qw_memzone; | |
227 | ||
228 | qw_memzone = rte_memzone_reserve(QUOTA_WATERMARK_MEMZONE_NAME, 2 * sizeof(int), rte_socket_id(), RTE_MEMZONE_2MB); | |
229 | ||
230 | if (qw_memzone == NULL) | |
231 | rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno)); | |
232 | ||
233 | quota = qw_memzone->addr; | |
234 | low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int); | |
235 | } | |
236 | ||
237 | These two variables are initialized to a default value in main() and | |
238 | can be changed while qw is running using the qwctl control program. | |
239 | ||
240 | Application Arguments | |
241 | ^^^^^^^^^^^^^^^^^^^^^ | |
242 | ||
243 | The qw application only takes one argument: a port mask that specifies which ports should be used by the application. | |
244 | At least two ports are needed to run the application and there should be an even number of ports given in the port mask. | |
245 | ||
246 | The port mask parsing is done in parse_qw_args(), defined in args.c. | |
247 | ||
248 | Mbuf Pool Initialization | |
249 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
250 | ||
251 | Once the application's arguments are parsed, an mbuf pool is created. | |
252 | It contains a set of mbuf objects that are used by the driver and the application to store network packets: | |
253 | ||
254 | .. code-block:: c | |
255 | ||
256 | /* Create a pool of mbuf to store packets */ | |
257 | ||
258 | mbuf_pool = rte_mempool_create("mbuf_pool", MBUF_PER_POOL, MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private), | |
259 | rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); | |
260 | ||
261 | if (mbuf_pool == NULL) | |
262 | rte_panic("%s\n", rte_strerror(rte_errno)); | |
263 | ||
264 | The rte_mempool is a generic structure used to handle pools of objects. | |
265 | In this case, it is necessary to create a pool that will be used by the driver, | |
266 | which expects to have some reserved space in the mempool structure, sizeof(struct rte_pktmbuf_pool_private) bytes. | |
267 | ||
268 | The number of allocated pkt mbufs is MBUF_PER_POOL, with a size of MBUF_SIZE each. | |
269 | A per-lcore cache of 32 mbufs is kept. | |
270 | The memory is allocated in on the master lcore's socket, but it is possible to extend this code to allocate one mbuf pool per socket. | |
271 | ||
272 | Two callback pointers are also given to the rte_mempool_create() function: | |
273 | ||
274 | * The first callback pointer is to rte_pktmbuf_pool_init() and is used to initialize the private data of the mempool, | |
275 | which is needed by the driver. | |
276 | This function is provided by the mbuf API, but can be copied and extended by the developer. | |
277 | ||
278 | * The second callback pointer given to rte_mempool_create() is the mbuf initializer. | |
279 | ||
280 | The default is used, that is, rte_pktmbuf_init(), which is provided in the rte_mbuf library. | |
281 | If a more complex application wants to extend the rte_pktmbuf structure for its own needs, | |
282 | a new function derived from rte_pktmbuf_init() can be created. | |
283 | ||
284 | Ports Configuration and Pairing | |
285 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
286 | ||
287 | Each port in the port mask is configured and a corresponding ring is created in the master lcore's array of rings. | |
288 | This ring is the first in the pipeline and will hold the packets directly coming from the port. | |
289 | ||
290 | .. code-block:: c | |
291 | ||
292 | for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) | |
293 | if (is_bit_set(port_id, portmask)) { | |
294 | configure_eth_port(port_id); | |
295 | init_ring(master_lcore_id, port_id); | |
296 | } | |
297 | ||
298 | pair_ports(); | |
299 | ||
300 | The configure_eth_port() and init_ring() functions are used to configure a port and a ring respectively and are defined in init.c. | |
301 | They make use of the DPDK APIs defined in rte_eth.h and rte_ring.h. | |
302 | ||
303 | pair_ports() builds the port_pairs[] array so that its key-value pairs are a mapping between reception and transmission ports. | |
304 | It is defined in init.c. | |
305 | ||
306 | Logical Cores Assignment | |
307 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
308 | ||
309 | The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port. | |
310 | ||
311 | Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core. | |
312 | pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See :numref:`figure_threads_pipelines`. | |
313 | ||
314 | .. code-block:: c | |
315 | ||
316 | /* Start pipeline_stage() on all the available slave lcore but the last */ | |
317 | ||
318 | for (lcore_id = 0 ; lcore_id < last_lcore_id; lcore_id++) { | |
319 | if (rte_lcore_is_enabled(lcore_id) && lcore_id != master_lcore_id) { | |
320 | for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) | |
321 | if (is_bit_set(port_id, portmask)) | |
322 | init_ring(lcore_id, port_id); | |
323 | ||
324 | rte_eal_remote_launch(pipeline_stage, NULL, lcore_id); | |
325 | } | |
326 | } | |
327 | ||
328 | The last available logical core runs send_stage(), | |
329 | which is the last stage of the pipeline dequeuing packets from the last ring in the pipeline and | |
330 | sending them out on the destination port setup by pair_ports(). | |
331 | ||
332 | .. code-block:: c | |
333 | ||
334 | /* Start send_stage() on the last slave core */ | |
335 | ||
336 | rte_eal_remote_launch(send_stage, NULL, last_lcore_id); | |
337 | ||
338 | Receive, Process and Transmit Packets | |
339 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
340 | ||
341 | .. _figure_threads_pipelines: | |
342 | ||
343 | .. figure:: img/threads_pipelines.* | |
344 | ||
345 | Threads and Pipelines | |
346 | ||
347 | ||
348 | In the receive_stage() function running on the master logical core, | |
349 | the main task is to read ingress packets from the RX ports and enqueue them | |
350 | on the port's corresponding first ring in the pipeline. | |
351 | This is done using the following code: | |
352 | ||
353 | .. code-block:: c | |
354 | ||
355 | lcore_id = rte_lcore_id(); | |
356 | ||
357 | /* Process each port round robin style */ | |
358 | ||
359 | for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { | |
360 | if (!is_bit_set(port_id, portmask)) | |
361 | continue; | |
362 | ||
363 | ring = rings[lcore_id][port_id]; | |
364 | ||
365 | if (ring_state[port_id] != RING_READY) { | |
366 | if (rte_ring_count(ring) > *low_watermark) | |
367 | continue; | |
368 | else | |
369 | ring_state[port_id] = RING_READY; | |
370 | } | |
371 | ||
372 | /* Enqueue received packets on the RX ring */ | |
373 | ||
374 | nb_rx_pkts = rte_eth_rx_burst(port_id, 0, pkts, *quota); | |
375 | ||
376 | ret = rte_ring_enqueue_bulk(ring, (void *) pkts, nb_rx_pkts); | |
377 | if (ret == -EDQUOT) { | |
378 | ring_state[port_id] = RING_OVERLOADED; | |
379 | send_pause_frame(port_id, 1337); | |
380 | } | |
381 | } | |
382 | ||
383 | For each port in the port mask, the corresponding ring's pointer is fetched into ring and that ring's state is checked: | |
384 | ||
385 | * If it is in the RING_READY state, \*quota packets are grabbed from the port and put on the ring. | |
386 | Should this operation make the ring's usage cross its high watermark, | |
387 | the ring is marked as overloaded and an Ethernet flow control frame is sent to the source. | |
388 | ||
389 | * If it is not in the RING_READY state, this port is ignored until the ring's usage crosses the \*low_watermark value. | |
390 | ||
391 | The pipeline_stage() function's task is to process and move packets from the preceding pipeline stage. | |
392 | This thread is running on most of the logical cores to create and arbitrarily long pipeline. | |
393 | ||
394 | .. code-block:: c | |
395 | ||
396 | lcore_id = rte_lcore_id(); | |
397 | ||
398 | previous_lcore_id = get_previous_lcore_id(lcore_id); | |
399 | ||
400 | for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { | |
401 | if (!is_bit_set(port_id, portmask)) | |
402 | continue; | |
403 | ||
404 | tx = rings[lcore_id][port_id]; | |
405 | rx = rings[previous_lcore_id][port_id]; | |
406 | if (ring_state[port_id] != RING_READY) { | |
407 | if (rte_ring_count(tx) > *low_watermark) | |
408 | continue; | |
409 | else | |
410 | ring_state[port_id] = RING_READY; | |
411 | } | |
412 | ||
413 | /* Dequeue up to quota mbuf from rx */ | |
414 | ||
415 | nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota); | |
416 | ||
417 | if (unlikely(nb_dq_pkts < 0)) | |
418 | continue; | |
419 | ||
420 | /* Enqueue them on tx */ | |
421 | ||
422 | ret = rte_ring_enqueue_bulk(tx, pkts, nb_dq_pkts); | |
423 | if (ret == -EDQUOT) | |
424 | ring_state[port_id] = RING_OVERLOADED; | |
425 | } | |
426 | ||
427 | The thread's logic works mostly like receive_stage(), | |
428 | except that packets are moved from ring to ring instead of port to ring. | |
429 | ||
430 | In this example, no actual processing is done on the packets, | |
431 | but pipeline_stage() is an ideal place to perform any processing required by the application. | |
432 | ||
433 | Finally, the send_stage() function's task is to read packets from the last ring in a pipeline and | |
434 | send them on the destination port defined in the port_pairs[] array. | |
435 | It is running on the last available logical core only. | |
436 | ||
437 | .. code-block:: c | |
438 | ||
439 | lcore_id = rte_lcore_id(); | |
440 | ||
441 | previous_lcore_id = get_previous_lcore_id(lcore_id); | |
442 | ||
443 | for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { | |
444 | if (!is_bit_set(port_id, portmask)) continue; | |
445 | ||
446 | dest_port_id = port_pairs[port_id]; | |
447 | tx = rings[previous_lcore_id][port_id]; | |
448 | ||
449 | if (rte_ring_empty(tx)) continue; | |
450 | ||
451 | /* Dequeue packets from tx and send them */ | |
452 | ||
453 | nb_dq_pkts = rte_ring_dequeue_burst(tx, (void *) tx_pkts, *quota); | |
454 | nb_tx_pkts = rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts); | |
455 | } | |
456 | ||
457 | For each port in the port mask, up to \*quota packets are pulled from the last ring in its pipeline and | |
458 | sent on the destination port paired with the current port. | |
459 | ||
460 | Control Application - qwctl | |
461 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
462 | ||
463 | The qwctl application uses the rte_cmdline library to provide the user with an interactive command line that | |
464 | can be used to modify and inspect parameters in a running qw application. | |
465 | Those parameters are the global quota and low_watermark value as well as each ring's built-in high watermark. | |
466 | ||
467 | Command Definitions | |
468 | ^^^^^^^^^^^^^^^^^^^ | |
469 | ||
470 | The available commands are defined in commands.c. | |
471 | ||
472 | It is advised to use the cmdline sample application user guide as a reference for everything related to the rte_cmdline library. | |
473 | ||
474 | Accessing Shared Variables | |
475 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
476 | ||
477 | The setup_shared_variables() function retrieves the shared variables quota and | |
478 | low_watermark from the rte_memzone previously created by qw. | |
479 | ||
480 | .. code-block:: c | |
481 | ||
482 | static void | |
483 | setup_shared_variables(void) | |
484 | { | |
485 | const struct rte_memzone *qw_memzone; | |
486 | ||
487 | qw_memzone = rte_memzone_lookup(QUOTA_WATERMARK_MEMZONE_NAME); | |
488 | if (qw_memzone == NULL) | |
489 | rte_exit(EXIT_FAILURE, "Couldn't find memzone\n"); | |
490 | ||
491 | quota = qw_memzone->addr; | |
492 | ||
493 | low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int); | |
494 | } |