]> git.proxmox.com Git - ceph.git/blame - ceph/src/dpdk/doc/guides/sample_app_ug/quota_watermark.rst
bump version to 12.2.12-pve1
[ceph.git] / ceph / src / dpdk / doc / guides / sample_app_ug / quota_watermark.rst
CommitLineData
7c673cae
FG
1.. BSD LICENSE
2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3 All rights reserved.
4
5 Redistribution and use in source and binary forms, with or without
6 modification, are permitted provided that the following conditions
7 are met:
8
9 * Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
11 * Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in
13 the documentation and/or other materials provided with the
14 distribution.
15 * Neither the name of Intel Corporation nor the names of its
16 contributors may be used to endorse or promote products derived
17 from this software without specific prior written permission.
18
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31Quota and Watermark Sample Application
32======================================
33
34The Quota and Watermark sample application is a simple example of packet processing using Data Plane Development Kit (DPDK) that
35showcases the use of a quota as the maximum number of packets enqueue/dequeue at a time and low and high watermarks
36to signal low and high ring usage respectively.
37
38Additionally, it shows how ring watermarks can be used to feedback congestion notifications to data producers by
39temporarily stopping processing overloaded rings and sending Ethernet flow control frames.
40
41This sample application is split in two parts:
42
43* qw - The core quota and watermark sample application
44
45* qwctl - A command line tool to alter quota and watermarks while qw is running
46
47Overview
48--------
49
50The Quota and Watermark sample application performs forwarding for each packet that is received on a given port.
51The destination port is the adjacent port from the enabled port mask, that is,
52if the first four ports are enabled (port mask 0xf), ports 0 and 1 forward into each other,
53and ports 2 and 3 forward into each other.
54The MAC addresses of the forwarded Ethernet frames are not affected.
55
56Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline,
57each stage of which being connected by rings, as shown in :numref:`figure_pipeline_overview`.
58
59.. _figure_pipeline_overview:
60
61.. figure:: img/pipeline_overview.*
62
63 Pipeline Overview
64
65
66An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue.
67Adjustable watermark values associated with the rings control a back-off mechanism that
68tries to prevent the pipeline from being overloaded by:
69
70* Stopping enqueuing on rings for which the usage has crossed the high watermark threshold
71
72* Sending Ethernet pause frames
73
74* Only resuming enqueuing on a ring once its usage goes below a global low watermark threshold
75
76This mechanism allows congestion notifications to go up the ring pipeline and
77eventually lead to an Ethernet flow control frame being send to the source.
78
79On top of serving as an example of quota and watermark usage,
80this application can be used to benchmark ring based processing pipelines performance using a traffic- generator,
81as shown in :numref:`figure_ring_pipeline_perf_setup`.
82
83.. _figure_ring_pipeline_perf_setup:
84
85.. figure:: img/ring_pipeline_perf_setup.*
86
87 Ring-based Processing Pipeline Performance Setup
88
89
90Compiling the Application
91-------------------------
92
93#. Go to the example directory:
94
95 .. code-block:: console
96
97 export RTE_SDK=/path/to/rte_sdk
98 cd ${RTE_SDK}/examples/quota_watermark
99
100#. Set the target (a default target is used if not specified). For example:
101
102 .. code-block:: console
103
104 export RTE_TARGET=x86_64-native-linuxapp-gcc
105
106 See the *DPDK Getting Started Guide* for possible RTE_TARGET values.
107
108#. Build the application:
109
110 .. code-block:: console
111
112 make
113
114Running the Application
115-----------------------
116
117The core application, qw, has to be started first.
118
119Once it is up and running, one can alter quota and watermarks while it runs using the control application, qwctl.
120
121Running the Core Application
122~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123
124The application requires a single command line option:
125
126.. code-block:: console
127
128 ./qw/build/qw [EAL options] -- -p PORTMASK
129
130where,
131
132-p PORTMASK: A hexadecimal bitmask of the ports to configure
133
134To run the application in a linuxapp environment with four logical cores and ports 0 and 2,
135issue the following command:
136
137.. code-block:: console
138
139 ./qw/build/qw -c f -n 4 -- -p 5
140
141Refer to the *DPDK Getting Started Guide* for general information on running applications and
142the Environment Abstraction Layer (EAL) options.
143
144Running the Control Application
145~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146
147The control application requires a number of command line options:
148
149.. code-block:: console
150
151 ./qwctl/build/qwctl [EAL options] --proc-type=secondary
152
153The --proc-type=secondary option is necessary for the EAL to properly initialize the control application to
154use the same huge pages as the core application and thus be able to access its rings.
155
156To run the application in a linuxapp environment on logical core 0, issue the following command:
157
158.. code-block:: console
159
160 ./qwctl/build/qwctl -c 1 -n 4 --proc-type=secondary
161
162Refer to the *DPDK Getting Started* Guide for general information on running applications and
163the Environment Abstraction Layer (EAL) options.
164
165qwctl is an interactive command line that let the user change variables in a running instance of qw.
166The help command gives a list of available commands:
167
168.. code-block:: console
169
170 $ qwctl > help
171
172Code Overview
173-------------
174
175The following sections provide a quick guide to the application's source code.
176
177Core Application - qw
178~~~~~~~~~~~~~~~~~~~~~
179
180EAL and Drivers Setup
181^^^^^^^^^^^^^^^^^^^^^
182
183The EAL arguments are parsed at the beginning of the main() function:
184
185.. code-block:: c
186
187 ret = rte_eal_init(argc, argv);
188 if (ret < 0)
189 rte_exit(EXIT_FAILURE, "Cannot initialize EAL\n");
190
191 argc -= ret;
192 argv += ret;
193
194Then, a call to init_dpdk(), defined in init.c, is made to initialize the poll mode drivers:
195
196.. code-block:: c
197
198 void
199 init_dpdk(void)
200 {
201 int ret;
202
203 /* Bind the drivers to usable devices */
204
205 ret = rte_eal_pci_probe();
206 if (ret < 0)
207 rte_exit(EXIT_FAILURE, "rte_eal_pci_probe(): error %d\n", ret);
208
209 if (rte_eth_dev_count() < 2)
210 rte_exit(EXIT_FAILURE, "Not enough Ethernet port available\n");
211 }
212
213To fully understand this code, it is recommended to study the chapters that relate to the *Poll Mode Driver*
214in the *DPDK Getting Started Guide* and the *DPDK API Reference*.
215
216Shared Variables Setup
217^^^^^^^^^^^^^^^^^^^^^^
218
219The quota and low_watermark shared variables are put into an rte_memzone using a call to setup_shared_variables():
220
221.. code-block:: c
222
223 void
224 setup_shared_variables(void)
225 {
226 const struct rte_memzone *qw_memzone;
227
228 qw_memzone = rte_memzone_reserve(QUOTA_WATERMARK_MEMZONE_NAME, 2 * sizeof(int), rte_socket_id(), RTE_MEMZONE_2MB);
229
230 if (qw_memzone == NULL)
231 rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno));
232
233 quota = qw_memzone->addr;
234 low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
235 }
236
237These two variables are initialized to a default value in main() and
238can be changed while qw is running using the qwctl control program.
239
240Application Arguments
241^^^^^^^^^^^^^^^^^^^^^
242
243The qw application only takes one argument: a port mask that specifies which ports should be used by the application.
244At least two ports are needed to run the application and there should be an even number of ports given in the port mask.
245
246The port mask parsing is done in parse_qw_args(), defined in args.c.
247
248Mbuf Pool Initialization
249^^^^^^^^^^^^^^^^^^^^^^^^
250
251Once the application's arguments are parsed, an mbuf pool is created.
252It contains a set of mbuf objects that are used by the driver and the application to store network packets:
253
254.. code-block:: c
255
256 /* Create a pool of mbuf to store packets */
257
258 mbuf_pool = rte_mempool_create("mbuf_pool", MBUF_PER_POOL, MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private),
259 rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0);
260
261 if (mbuf_pool == NULL)
262 rte_panic("%s\n", rte_strerror(rte_errno));
263
264The rte_mempool is a generic structure used to handle pools of objects.
265In this case, it is necessary to create a pool that will be used by the driver,
266which expects to have some reserved space in the mempool structure, sizeof(struct rte_pktmbuf_pool_private) bytes.
267
268The number of allocated pkt mbufs is MBUF_PER_POOL, with a size of MBUF_SIZE each.
269A per-lcore cache of 32 mbufs is kept.
270The memory is allocated in on the master lcore's socket, but it is possible to extend this code to allocate one mbuf pool per socket.
271
272Two callback pointers are also given to the rte_mempool_create() function:
273
274* The first callback pointer is to rte_pktmbuf_pool_init() and is used to initialize the private data of the mempool,
275 which is needed by the driver.
276 This function is provided by the mbuf API, but can be copied and extended by the developer.
277
278* The second callback pointer given to rte_mempool_create() is the mbuf initializer.
279
280The default is used, that is, rte_pktmbuf_init(), which is provided in the rte_mbuf library.
281If a more complex application wants to extend the rte_pktmbuf structure for its own needs,
282a new function derived from rte_pktmbuf_init() can be created.
283
284Ports Configuration and Pairing
285^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
286
287Each port in the port mask is configured and a corresponding ring is created in the master lcore's array of rings.
288This ring is the first in the pipeline and will hold the packets directly coming from the port.
289
290.. code-block:: c
291
292 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
293 if (is_bit_set(port_id, portmask)) {
294 configure_eth_port(port_id);
295 init_ring(master_lcore_id, port_id);
296 }
297
298 pair_ports();
299
300The configure_eth_port() and init_ring() functions are used to configure a port and a ring respectively and are defined in init.c.
301They make use of the DPDK APIs defined in rte_eth.h and rte_ring.h.
302
303pair_ports() builds the port_pairs[] array so that its key-value pairs are a mapping between reception and transmission ports.
304It is defined in init.c.
305
306Logical Cores Assignment
307^^^^^^^^^^^^^^^^^^^^^^^^
308
309The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port.
310
311Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core.
312pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See :numref:`figure_threads_pipelines`.
313
314.. code-block:: c
315
316 /* Start pipeline_stage() on all the available slave lcore but the last */
317
318 for (lcore_id = 0 ; lcore_id < last_lcore_id; lcore_id++) {
319 if (rte_lcore_is_enabled(lcore_id) && lcore_id != master_lcore_id) {
320 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++)
321 if (is_bit_set(port_id, portmask))
322 init_ring(lcore_id, port_id);
323
324 rte_eal_remote_launch(pipeline_stage, NULL, lcore_id);
325 }
326 }
327
328The last available logical core runs send_stage(),
329which is the last stage of the pipeline dequeuing packets from the last ring in the pipeline and
330sending them out on the destination port setup by pair_ports().
331
332.. code-block:: c
333
334 /* Start send_stage() on the last slave core */
335
336 rte_eal_remote_launch(send_stage, NULL, last_lcore_id);
337
338Receive, Process and Transmit Packets
339^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
340
341.. _figure_threads_pipelines:
342
343.. figure:: img/threads_pipelines.*
344
345 Threads and Pipelines
346
347
348In the receive_stage() function running on the master logical core,
349the main task is to read ingress packets from the RX ports and enqueue them
350on the port's corresponding first ring in the pipeline.
351This is done using the following code:
352
353.. code-block:: c
354
355 lcore_id = rte_lcore_id();
356
357 /* Process each port round robin style */
358
359 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
360 if (!is_bit_set(port_id, portmask))
361 continue;
362
363 ring = rings[lcore_id][port_id];
364
365 if (ring_state[port_id] != RING_READY) {
366 if (rte_ring_count(ring) > *low_watermark)
367 continue;
368 else
369 ring_state[port_id] = RING_READY;
370 }
371
372 /* Enqueue received packets on the RX ring */
373
374 nb_rx_pkts = rte_eth_rx_burst(port_id, 0, pkts, *quota);
375
376 ret = rte_ring_enqueue_bulk(ring, (void *) pkts, nb_rx_pkts);
377 if (ret == -EDQUOT) {
378 ring_state[port_id] = RING_OVERLOADED;
379 send_pause_frame(port_id, 1337);
380 }
381 }
382
383For each port in the port mask, the corresponding ring's pointer is fetched into ring and that ring's state is checked:
384
385* If it is in the RING_READY state, \*quota packets are grabbed from the port and put on the ring.
386 Should this operation make the ring's usage cross its high watermark,
387 the ring is marked as overloaded and an Ethernet flow control frame is sent to the source.
388
389* If it is not in the RING_READY state, this port is ignored until the ring's usage crosses the \*low_watermark value.
390
391The pipeline_stage() function's task is to process and move packets from the preceding pipeline stage.
392This thread is running on most of the logical cores to create and arbitrarily long pipeline.
393
394.. code-block:: c
395
396 lcore_id = rte_lcore_id();
397
398 previous_lcore_id = get_previous_lcore_id(lcore_id);
399
400 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
401 if (!is_bit_set(port_id, portmask))
402 continue;
403
404 tx = rings[lcore_id][port_id];
405 rx = rings[previous_lcore_id][port_id];
406 if (ring_state[port_id] != RING_READY) {
407 if (rte_ring_count(tx) > *low_watermark)
408 continue;
409 else
410 ring_state[port_id] = RING_READY;
411 }
412
413 /* Dequeue up to quota mbuf from rx */
414
415 nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
416
417 if (unlikely(nb_dq_pkts < 0))
418 continue;
419
420 /* Enqueue them on tx */
421
422 ret = rte_ring_enqueue_bulk(tx, pkts, nb_dq_pkts);
423 if (ret == -EDQUOT)
424 ring_state[port_id] = RING_OVERLOADED;
425 }
426
427The thread's logic works mostly like receive_stage(),
428except that packets are moved from ring to ring instead of port to ring.
429
430In this example, no actual processing is done on the packets,
431but pipeline_stage() is an ideal place to perform any processing required by the application.
432
433Finally, the send_stage() function's task is to read packets from the last ring in a pipeline and
434send them on the destination port defined in the port_pairs[] array.
435It is running on the last available logical core only.
436
437.. code-block:: c
438
439 lcore_id = rte_lcore_id();
440
441 previous_lcore_id = get_previous_lcore_id(lcore_id);
442
443 for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) {
444 if (!is_bit_set(port_id, portmask)) continue;
445
446 dest_port_id = port_pairs[port_id];
447 tx = rings[previous_lcore_id][port_id];
448
449 if (rte_ring_empty(tx)) continue;
450
451 /* Dequeue packets from tx and send them */
452
453 nb_dq_pkts = rte_ring_dequeue_burst(tx, (void *) tx_pkts, *quota);
454 nb_tx_pkts = rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
455 }
456
457For each port in the port mask, up to \*quota packets are pulled from the last ring in its pipeline and
458sent on the destination port paired with the current port.
459
460Control Application - qwctl
461~~~~~~~~~~~~~~~~~~~~~~~~~~~
462
463The qwctl application uses the rte_cmdline library to provide the user with an interactive command line that
464can be used to modify and inspect parameters in a running qw application.
465Those parameters are the global quota and low_watermark value as well as each ring's built-in high watermark.
466
467Command Definitions
468^^^^^^^^^^^^^^^^^^^
469
470The available commands are defined in commands.c.
471
472It is advised to use the cmdline sample application user guide as a reference for everything related to the rte_cmdline library.
473
474Accessing Shared Variables
475^^^^^^^^^^^^^^^^^^^^^^^^^^
476
477The setup_shared_variables() function retrieves the shared variables quota and
478low_watermark from the rte_memzone previously created by qw.
479
480.. code-block:: c
481
482 static void
483 setup_shared_variables(void)
484 {
485 const struct rte_memzone *qw_memzone;
486
487 qw_memzone = rte_memzone_lookup(QUOTA_WATERMARK_MEMZONE_NAME);
488 if (qw_memzone == NULL)
489 rte_exit(EXIT_FAILURE, "Couldn't find memzone\n");
490
491 quota = qw_memzone->addr;
492
493 low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
494 }