]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | .. BSD LICENSE |
2 | Copyright(c) 2010-2014 Intel Corporation. All rights reserved. | |
3 | All rights reserved. | |
4 | ||
5 | Redistribution and use in source and binary forms, with or without | |
6 | modification, are permitted provided that the following conditions | |
7 | are met: | |
8 | ||
9 | * Redistributions of source code must retain the above copyright | |
10 | notice, this list of conditions and the following disclaimer. | |
11 | * Redistributions in binary form must reproduce the above copyright | |
12 | notice, this list of conditions and the following disclaimer in | |
13 | the documentation and/or other materials provided with the | |
14 | distribution. | |
15 | * Neither the name of Intel Corporation nor the names of its | |
16 | contributors may be used to endorse or promote products derived | |
17 | from this software without specific prior written permission. | |
18 | ||
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | |
20 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | |
21 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | |
22 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | |
23 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | |
24 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | |
25 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
26 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | |
27 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
28 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
30 | ||
31 | Load Balancer Sample Application | |
32 | ================================ | |
33 | ||
34 | The Load Balancer sample application demonstrates the concept of isolating the packet I/O task | |
35 | from the application-specific workload. | |
36 | Depending on the performance target, | |
37 | a number of logical cores (lcores) are dedicated to handle the interaction with the NIC ports (I/O lcores), | |
38 | while the rest of the lcores are dedicated to performing the application processing (worker lcores). | |
39 | The worker lcores are totally oblivious to the intricacies of the packet I/O activity and | |
40 | use the NIC-agnostic interface provided by software rings to exchange packets with the I/O cores. | |
41 | ||
42 | Overview | |
43 | -------- | |
44 | ||
45 | The architecture of the Load Balance application is presented in the following figure. | |
46 | ||
47 | .. _figure_load_bal_app_arch: | |
48 | ||
49 | .. figure:: img/load_bal_app_arch.* | |
50 | ||
51 | Load Balancer Application Architecture | |
52 | ||
53 | ||
54 | For the sake of simplicity, the diagram illustrates a specific case of two I/O RX and two I/O TX lcores off loading the packet I/O | |
55 | overhead incurred by four NIC ports from four worker cores, with each I/O lcore handling RX/TX for two NIC ports. | |
56 | ||
57 | I/O RX Logical Cores | |
58 | ~~~~~~~~~~~~~~~~~~~~ | |
59 | ||
60 | Each I/O RX lcore performs packet RX from its assigned NIC RX rings and then distributes the received packets to the worker threads. | |
61 | The application allows each I/O RX lcore to communicate with any of the worker threads, | |
62 | therefore each (I/O RX lcore, worker lcore) pair is connected through a dedicated single producer - single consumer software ring. | |
63 | ||
64 | The worker lcore to handle the current packet is determined by reading a predefined 1-byte field from the input packet: | |
65 | ||
66 | worker_id = packet[load_balancing_field] % n_workers | |
67 | ||
68 | Since all the packets that are part of the same traffic flow are expected to have the same value for the load balancing field, | |
69 | this scheme also ensures that all the packets that are part of the same traffic flow are directed to the same worker lcore (flow affinity) | |
70 | in the same order they enter the system (packet ordering). | |
71 | ||
72 | I/O TX Logical Cores | |
73 | ~~~~~~~~~~~~~~~~~~~~ | |
74 | ||
75 | Each I/O lcore owns the packet TX for a predefined set of NIC ports. To enable each worker thread to send packets to any NIC TX port, | |
76 | the application creates a software ring for each (worker lcore, NIC TX port) pair, | |
77 | with each I/O TX core handling those software rings that are associated with NIC ports that it handles. | |
78 | ||
79 | Worker Logical Cores | |
80 | ~~~~~~~~~~~~~~~~~~~~ | |
81 | ||
82 | Each worker lcore reads packets from its set of input software rings and | |
83 | routes them to the NIC ports for transmission by dispatching them to output software rings. | |
84 | The routing logic is LPM based, with all the worker threads sharing the same LPM rules. | |
85 | ||
86 | Compiling the Application | |
87 | ------------------------- | |
88 | ||
89 | The sequence of steps used to build the application is: | |
90 | ||
91 | #. Export the required environment variables: | |
92 | ||
93 | .. code-block:: console | |
94 | ||
95 | export RTE_SDK=<Path to the DPDK installation folder> | |
96 | export RTE_TARGET=x86_64-native-linuxapp-gcc | |
97 | ||
98 | #. Build the application executable file: | |
99 | ||
100 | .. code-block:: console | |
101 | ||
102 | cd ${RTE_SDK}/examples/load_balancer | |
103 | make | |
104 | ||
105 | For more details on how to build the DPDK libraries and sample applications, | |
106 | please refer to the *DPDK Getting Started Guide.* | |
107 | ||
108 | Running the Application | |
109 | ----------------------- | |
110 | ||
111 | To successfully run the application, | |
112 | the command line used to start the application has to be in sync with the traffic flows configured on the traffic generator side. | |
113 | ||
114 | For examples of application command lines and traffic generator flows, please refer to the DPDK Test Report. | |
115 | For more details on how to set up and run the sample applications provided with DPDK package, | |
116 | please refer to the *DPDK Getting Started Guide*. | |
117 | ||
118 | Explanation | |
119 | ----------- | |
120 | ||
121 | Application Configuration | |
122 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | |
123 | ||
124 | The application run-time configuration is done through the application command line parameters. | |
125 | Any parameter that is not specified as mandatory is optional, | |
126 | with the default value hard-coded in the main.h header file from the application folder. | |
127 | ||
128 | The list of application command line parameters is listed below: | |
129 | ||
130 | #. --rx "(PORT, QUEUE, LCORE), ...": The list of NIC RX ports and queues handled by the I/O RX lcores. | |
131 | This parameter also implicitly defines the list of I/O RX lcores. This is a mandatory parameter. | |
132 | ||
133 | #. --tx "(PORT, LCORE), ... ": The list of NIC TX ports handled by the I/O TX lcores. | |
134 | This parameter also implicitly defines the list of I/O TX lcores. | |
135 | This is a mandatory parameter. | |
136 | ||
137 | #. --w "LCORE, ...": The list of the worker lcores. This is a mandatory parameter. | |
138 | ||
139 | #. --lpm "IP / PREFIX => PORT; ...": The list of LPM rules used by the worker lcores for packet forwarding. | |
140 | This is a mandatory parameter. | |
141 | ||
142 | #. --rsz "A, B, C, D": Ring sizes: | |
143 | ||
144 | #. A = The size (in number of buffer descriptors) of each of the NIC RX rings read by the I/O RX lcores. | |
145 | ||
146 | #. B = The size (in number of elements) of each of the software rings used by the I/O RX lcores to send packets to worker lcores. | |
147 | ||
148 | #. C = The size (in number of elements) of each of the software rings used by the worker lcores to send packets to I/O TX lcores. | |
149 | ||
150 | #. D = The size (in number of buffer descriptors) of each of the NIC TX rings written by I/O TX lcores. | |
151 | ||
152 | #. --bsz "(A, B), (C, D), (E, F)": Burst sizes: | |
153 | ||
154 | #. A = The I/O RX lcore read burst size from NIC RX. | |
155 | ||
156 | #. B = The I/O RX lcore write burst size to the output software rings. | |
157 | ||
158 | #. C = The worker lcore read burst size from the input software rings. | |
159 | ||
160 | #. D = The worker lcore write burst size to the output software rings. | |
161 | ||
162 | #. E = The I/O TX lcore read burst size from the input software rings. | |
163 | ||
164 | #. F = The I/O TX lcore write burst size to the NIC TX. | |
165 | ||
166 | #. --pos-lb POS: The position of the 1-byte field within the input packet used by the I/O RX lcores | |
167 | to identify the worker lcore for the current packet. | |
168 | This field needs to be within the first 64 bytes of the input packet. | |
169 | ||
170 | The infrastructure of software rings connecting I/O lcores and worker lcores is built by the application | |
171 | as a result of the application configuration provided by the user through the application command line parameters. | |
172 | ||
173 | A specific lcore performing the I/O RX role for a specific set of NIC ports can also perform the I/O TX role | |
174 | for the same or a different set of NIC ports. | |
175 | A specific lcore cannot perform both the I/O role (either RX or TX) and the worker role during the same session. | |
176 | ||
177 | Example: | |
178 | ||
179 | .. code-block:: console | |
180 | ||
181 | ./load_balancer -c 0xf8 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx "(0,3),(1,3)" --w "4,5,6,7" --lpm "1.0.0.0/24=>0; 1.0.1.0/24=>1;" --pos-lb 29 | |
182 | ||
183 | There is a single I/O lcore (lcore 3) that handles RX and TX for two NIC ports (ports 0 and 1) that | |
184 | handles packets to/from four worker lcores (lcores 4, 5, 6 and 7) that | |
185 | are assigned worker IDs 0 to 3 (worker ID for lcore 4 is 0, for lcore 5 is 1, for lcore 6 is 2 and for lcore 7 is 3). | |
186 | ||
187 | Assuming that all the input packets are IPv4 packets with no VLAN label and the source IP address of the current packet is A.B.C.D, | |
188 | the worker lcore for the current packet is determined by byte D (which is byte 29). | |
189 | There are two LPM rules that are used by each worker lcore to route packets to the output NIC ports. | |
190 | ||
191 | The following table illustrates the packet flow through the system for several possible traffic flows: | |
192 | ||
193 | +------------+----------------+-----------------+------------------------------+--------------+ | |
194 | | **Flow #** | **Source** | **Destination** | **Worker ID (Worker lcore)** | **Output** | | |
195 | | | **IP Address** | **IP Address** | | **NIC Port** | | |
196 | | | | | | | | |
197 | +============+================+=================+==============================+==============+ | |
198 | | 1 | 0.0.0.0 | 1.0.0.1 | 0 (4) | 0 | | |
199 | | | | | | | | |
200 | +------------+----------------+-----------------+------------------------------+--------------+ | |
201 | | 2 | 0.0.0.1 | 1.0.1.2 | 1 (5) | 1 | | |
202 | | | | | | | | |
203 | +------------+----------------+-----------------+------------------------------+--------------+ | |
204 | | 3 | 0.0.0.14 | 1.0.0.3 | 2 (6) | 0 | | |
205 | | | | | | | | |
206 | +------------+----------------+-----------------+------------------------------+--------------+ | |
207 | | 4 | 0.0.0.15 | 1.0.1.4 | 3 (7) | 1 | | |
208 | | | | | | | | |
209 | +------------+----------------+-----------------+------------------------------+--------------+ | |
210 | ||
211 | NUMA Support | |
212 | ~~~~~~~~~~~~ | |
213 | ||
214 | The application has built-in performance enhancements for the NUMA case: | |
215 | ||
216 | #. One buffer pool per each CPU socket. | |
217 | ||
218 | #. One LPM table per each CPU socket. | |
219 | ||
220 | #. Memory for the NIC RX or TX rings is allocated on the same socket with the lcore handling the respective ring. | |
221 | ||
222 | In the case where multiple CPU sockets are used in the system, | |
223 | it is recommended to enable at least one lcore to fulfill the I/O role for the NIC ports that | |
224 | are directly attached to that CPU socket through the PCI Express* bus. | |
225 | It is always recommended to handle the packet I/O with lcores from the same CPU socket as the NICs. | |
226 | ||
227 | Depending on whether the I/O RX lcore (same CPU socket as NIC RX), | |
228 | the worker lcore and the I/O TX lcore (same CPU socket as NIC TX) handling a specific input packet, | |
229 | are on the same or different CPU sockets, the following run-time scenarios are possible: | |
230 | ||
231 | #. AAA: The packet is received, processed and transmitted without going across CPU sockets. | |
232 | ||
233 | #. AAB: The packet is received and processed on socket A, | |
234 | but as it has to be transmitted on a NIC port connected to socket B, | |
235 | the packet is sent to socket B through software rings. | |
236 | ||
237 | #. ABB: The packet is received on socket A, but as it has to be processed by a worker lcore on socket B, | |
238 | the packet is sent to socket B through software rings. | |
239 | The packet is transmitted by a NIC port connected to the same CPU socket as the worker lcore that processed it. | |
240 | ||
241 | #. ABC: The packet is received on socket A, it is processed by an lcore on socket B, | |
242 | then it has to be transmitted out by a NIC connected to socket C. | |
243 | The performance price for crossing the CPU socket boundary is paid twice for this packet. |