]>
Commit | Line | Data |
---|---|---|
98dc8dee BP |
1 | .. |
2 | Licensed under the Apache License, Version 2.0 (the "License"); you may | |
3 | not use this file except in compliance with the License. You may obtain | |
4 | a copy of the License at | |
5 | ||
6 | http://www.apache.org/licenses/LICENSE-2.0 | |
7 | ||
8 | Unless required by applicable law or agreed to in writing, software | |
9 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
10 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
11 | License for the specific language governing permissions and limitations | |
12 | under the License. | |
13 | ||
14 | Convention for heading levels in Open vSwitch documentation: | |
15 | ||
16 | ======= Heading 0 (reserved for the title in a document) | |
17 | ------- Heading 1 | |
18 | ~~~~~~~ Heading 2 | |
19 | +++++++ Heading 3 | |
20 | ''''''' Heading 4 | |
21 | ||
22 | Avoid deeper levels because they do not render well. | |
23 | ||
24 | =================== | |
25 | OVS Faucet Tutorial | |
26 | =================== | |
27 | ||
fd0e8355 BP |
28 | This tutorial demonstrates how Open vSwitch works with a general-purpose |
29 | OpenFlow controller, using the Faucet controller as a simple way to get | |
30 | started. It was tested with the "master" branch of Open vSwitch and version | |
dcc3e70b BC |
31 | 1.6.15 of Faucet. It does not use advanced or recently added features in OVS |
32 | or Faucet, so other versions of both pieces of software are likely to work | |
33 | equally well. | |
98dc8dee BP |
34 | |
35 | The goal of the tutorial is to demonstrate Open vSwitch and Faucet in an | |
36 | end-to-end way, that is, to show how it works from the Faucet controller | |
37 | configuration at the top, through the OpenFlow flow table, to the datapath | |
38 | processing. Along the way, in addition to helping to understand the | |
39 | architecture at each level, we discuss performance and troubleshooting issues. | |
40 | We hope that this demonstration makes it easier for users and potential users | |
41 | to understand how Open vSwitch works and how to debug and troubleshoot it. | |
42 | ||
43 | We provide enough details in the tutorial that you should be able to fully | |
44 | follow along by following the instructions. | |
45 | ||
46 | Setting Up OVS | |
47 | -------------- | |
48 | ||
49 | This section explains how to set up Open vSwitch for the purpose of using it | |
50 | with Faucet for the tutorial. | |
51 | ||
52 | You might already have Open vSwitch installed on one or more computers or VMs, | |
53 | perhaps set up to control a set of VMs or a physical network. This is | |
54 | admirable, but we will be using Open vSwitch in a different way to set up a | |
55 | simulation environment called the OVS "sandbox". The sandbox does not use | |
56 | virtual machines or containers, which makes it more limited, but on the other | |
57 | hand it is (in this writer's opinion) easier to set up. | |
58 | ||
59 | There are two ways to start a sandbox: one that uses the Open vSwitch that is | |
60 | already installed on a system, and another that uses a copy of Open vSwitch | |
61 | that has been built but not yet installed. The latter is more often used and | |
62 | thus better tested, but both should work. The instructions below explain both | |
63 | approaches: | |
64 | ||
65 | 1. Get a copy of the Open vSwitch source repository using Git, then ``cd`` into | |
66 | the new directory:: | |
67 | ||
68 | $ git clone https://github.com/openvswitch/ovs.git | |
69 | $ cd ovs | |
70 | ||
71 | The default checkout is the master branch. You can check out a tag | |
72 | (such as v2.8.0) or a branch (such as origin/branch-2.8), if you | |
73 | prefer. | |
74 | ||
75 | 2. If you do not already have an installed copy of Open vSwitch on your system, | |
76 | or if you do not want to use it for the sandbox (the sandbox will not | |
77 | disturb the functionality of any existing switches), then proceed to step 3. | |
78 | If you do have an installed copy and you want to use it for the sandbox, try | |
79 | to start the sandbox by running:: | |
80 | ||
81 | $ tutorial/ovs-sandbox | |
82 | ||
83 | If it is successful, you will find yourself in a subshell environment, which | |
84 | is the sandbox (you can exit with ``exit`` or Control+D). If so, you're | |
85 | finished and do not need to complete the rest of the steps. If it fails, | |
86 | you can proceed to step 3 to build Open vSwitch anyway. | |
87 | ||
88 | 3. Before you build, you might want to check that your system meets the build | |
89 | requirements. Read :doc:`/intro/install/general` to find out. For this | |
90 | tutorial, there is no need to compile the Linux kernel module, or to use any | |
91 | of the optional libraries such as OpenSSL, DPDK, or libcap-ng. | |
92 | ||
93 | 4. Configure and build Open vSwitch:: | |
94 | ||
95 | $ ./boot.sh | |
96 | $ ./configure | |
97 | $ make -j4 | |
98 | ||
99 | 5. Try out the sandbox by running:: | |
100 | ||
101 | $ make sandbox | |
102 | ||
103 | You can exit the sandbox with ``exit`` or Control+D. | |
104 | ||
105 | Setting up Faucet | |
106 | ----------------- | |
107 | ||
108 | This section explains how to get a copy of Faucet and set it up | |
109 | appropriately for the tutorial. There are many other ways to install | |
110 | Faucet, but this simple approach worked well for me. It has the | |
111 | advantage that it does not require modifying any system-level files or | |
112 | directories on your machine. It does, on the other hand, require | |
113 | Docker, so make sure you have it installed and working. | |
114 | ||
115 | It will be a little easier to go through the rest of the tutorial if | |
116 | you run these instructions in a separate terminal from the one that | |
117 | you're using for Open vSwitch, because it's often necessary to switch | |
118 | between one and the other. | |
119 | ||
120 | 1. Get a copy of the Faucet source repository using Git, then ``cd`` | |
121 | into the new directory:: | |
122 | ||
123 | $ git clone https://github.com/faucetsdn/faucet.git | |
124 | $ cd faucet | |
125 | ||
126 | At this point I checked out the latest tag:: | |
127 | ||
dcc3e70b BC |
128 | $ latest_tag=$(git describe --tags $(git rev-list --tags --max-count=1)) |
129 | $ git checkout $latest_tag | |
98dc8dee BP |
130 | |
131 | 2. Build a docker container image:: | |
132 | ||
133 | $ docker build -t faucet/faucet . | |
134 | ||
135 | This will take a few minutes. | |
136 | ||
137 | 3. Create an installation directory under the ``faucet`` directory for | |
138 | the docker image to use:: | |
139 | ||
140 | $ mkdir inst | |
141 | ||
142 | The Faucet configuration will go in ``inst/faucet.yaml`` and its | |
143 | main log will appear in ``inst/faucet.log``. (The official Faucet | |
144 | installation instructions call to put these in ``/etc/ryu/faucet`` | |
145 | and ``/var/log/ryu/faucet``, respectively, but we avoid modifying | |
146 | these system directories.) | |
147 | ||
148 | 4. Create a container and start Faucet:: | |
149 | ||
a1fc8639 | 150 | $ docker run -d --name faucet --restart=always -v $(pwd)/inst/:/etc/faucet/ -v $(pwd)/inst/:/var/log/faucet/ -p 6653:6653 -p 9302:9302 faucet/faucet |
98dc8dee BP |
151 | |
152 | 5. Look in ``inst/faucet.log`` to verify that Faucet started. It will | |
153 | probably start with an exception and traceback because we have not | |
154 | yet created ``inst/faucet.yaml``. | |
155 | ||
156 | 6. Later on, to make a new or updated Faucet configuration take | |
157 | effect quickly, you can run:: | |
158 | ||
159 | $ docker exec faucet pkill -HUP -f faucet.faucet | |
160 | ||
161 | Another way is to stop and start the Faucet container:: | |
162 | ||
163 | $ docker restart faucet | |
164 | ||
165 | You can also stop and delete the container; after this, to start it | |
166 | again, you need to rerun the ``docker run`` command:: | |
167 | ||
168 | $ docker stop faucet | |
169 | $ docker rm faucet | |
170 | ||
171 | Overview | |
172 | -------- | |
173 | ||
174 | Now that Open vSwitch and Faucet are ready, here's an overview of what | |
175 | we're going to do for the remainder of the tutorial: | |
176 | ||
177 | 1. Switching: Set up an L2 network with Faucet. | |
178 | ||
179 | 2. Routing: Route between multiple L3 networks with Faucet. | |
180 | ||
181 | 3. ACLs: Add and modify access control rules. | |
182 | ||
183 | At each step, we will take a look at how the features in question work | |
184 | from Faucet at the top to the data plane layer at the bottom. From | |
185 | the highest to lowest level, these layers and the software components | |
186 | that connect them are: | |
187 | ||
fd0e8355 BP |
188 | Faucet. |
189 | As the top level in the system, this is the authoritative source of the | |
190 | network configuration. | |
98dc8dee BP |
191 | |
192 | Faucet connects to a variety of monitoring and performance tools, | |
193 | but we won't use them in this tutorial. Our main insights into the | |
194 | system will be through ``faucet.yaml`` for configuration and | |
195 | ``faucet.log`` to observe state, such as MAC learning and ARP | |
196 | resolution, and to tell when we've screwed up configuration syntax | |
197 | or semantics. | |
198 | ||
fd0e8355 BP |
199 | The OpenFlow subsystem in Open vSwitch. |
200 | OpenFlow is the protocol, standardized by the Open Networking Foundation, | |
201 | that controllers like Faucet use to control how Open vSwitch and other | |
202 | switches treat packets in the network. | |
98dc8dee BP |
203 | |
204 | We will use ``ovs-ofctl``, a utility that comes with Open vSwitch, | |
205 | to observe and occasionally modify Open vSwitch's OpenFlow behavior. | |
206 | We will also use ``ovs-appctl``, a utility for communicating with | |
207 | ``ovs-vswitchd`` and other Open vSwitch daemons, to ask "what-if?" | |
208 | type questions. | |
209 | ||
210 | In addition, the OVS sandbox by default raises the Open vSwitch | |
211 | logging level for OpenFlow high enough that we can learn a great | |
212 | deal about OpenFlow behavior simply by reading its log file. | |
213 | ||
fd0e8355 BP |
214 | Open vSwitch datapath. |
215 | This is essentially a cache designed to accelerate packet processing. Open | |
216 | vSwitch includes a few different datapaths, such as one based on the Linux | |
217 | kernel and a userspace-only datapath (sometimes called the "DPDK" datapath). | |
218 | The OVS sandbox uses the latter, but the principles behind it apply equally | |
219 | well to other datapaths. | |
98dc8dee BP |
220 | |
221 | At each step, we discuss how the design of each layer influences | |
222 | performance. We demonstrate how Open vSwitch features can be used to | |
223 | debug, troubleshoot, and understand the system as a whole. | |
224 | ||
225 | Switching | |
226 | --------- | |
227 | ||
228 | Layer-2 (L2) switching is the basis of modern networking. It's also | |
229 | very simple and a good place to start, so let's set up a switch with | |
230 | some VLANs in Faucet and see how it works at each layer. Begin by | |
231 | putting the following into ``inst/faucet.yaml``:: | |
232 | ||
233 | dps: | |
234 | switch-1: | |
235 | dp_id: 0x1 | |
236 | timeout: 3600 | |
237 | arp_neighbor_timeout: 3600 | |
238 | interfaces: | |
239 | 1: | |
240 | native_vlan: 100 | |
241 | 2: | |
242 | native_vlan: 100 | |
243 | 3: | |
244 | native_vlan: 100 | |
245 | 4: | |
246 | native_vlan: 200 | |
247 | 5: | |
248 | native_vlan: 200 | |
249 | vlans: | |
250 | 100: | |
251 | 200: | |
252 | ||
253 | This configuration file defines a single switch ("datapath" or "dp") | |
254 | named ``switch-1``. The switch has five ports, numbered 1 through 5. | |
255 | Ports 1, 2, and 3 are in VLAN 100, and ports 4 and 5 are in VLAN 2. | |
256 | Faucet can identify the switch from its datapath ID, which is defined | |
257 | to be 0x1. | |
258 | ||
259 | .. note:: | |
260 | ||
261 | This also sets high MAC learning and ARP timeouts. The defaults are | |
262 | 5 minutes and about 8 minutes, which are fine in production but | |
263 | sometimes too fast for manual experimentation. (Don't use a timeout | |
264 | bigger than about 65000 seconds because it will crash Faucet.) | |
265 | ||
266 | Now restart Faucet so that the configuration takes effect, e.g.:: | |
267 | ||
268 | $ docker restart faucet | |
269 | ||
270 | Assuming that the configuration update is successful, you should now | |
271 | see a new line at the end of ``inst/faucet.log``:: | |
272 | ||
dcc3e70b | 273 | Jan 06 15:14:35 faucet INFO Add new datapath DPID 1 (0x1) |
98dc8dee BP |
274 | |
275 | Faucet is now waiting for a switch with datapath ID 0x1 to connect to | |
276 | it over OpenFlow, so our next step is to create a switch with OVS and | |
277 | make it connect to Faucet. To do that, switch to the terminal where | |
278 | you checked out OVS and start a sandbox with ``make sandbox`` or | |
fd0e8355 | 279 | ``tutorial/ovs-sandbox`` (as explained earlier under `Setting Up |
98dc8dee BP |
280 | OVS`_). You should see something like this toward the end of the |
281 | output:: | |
282 | ||
283 | ---------------------------------------------------------------------- | |
284 | You are running in a dummy Open vSwitch environment. You can use | |
285 | ovs-vsctl, ovs-ofctl, ovs-appctl, and other tools to work with the | |
286 | dummy switch. | |
287 | ||
288 | Log files, pidfiles, and the configuration database are in the | |
289 | "sandbox" subdirectory. | |
290 | ||
291 | Exit the shell to kill the running daemons. | |
292 | blp@sigabrt:~/nicira/ovs/tutorial(0)$ | |
dcc3e70b | 293 | |
98dc8dee BP |
294 | Inside the sandbox, create a switch ("bridge") named ``br0``, set its |
295 | datapath ID to 0x1, add simulated ports to it named ``p1`` through | |
296 | ``p5``, and tell it to connect to the Faucet controller. To make it | |
297 | easier to understand, we request for port ``p1`` to be assigned | |
298 | OpenFlow port 1, ``p2`` port 2, and so on. As a final touch, | |
299 | configure the controller to be "out-of-band" (this is mainly to avoid | |
300 | some annoying messages in the ``ovs-vswitchd`` logs; for more | |
301 | information, run ``man ovs-vswitchd.conf.db`` and search for | |
302 | ``connection_mode``):: | |
303 | ||
304 | $ ovs-vsctl add-br br0 \ | |
5a0e4aec BP |
305 | -- set bridge br0 other-config:datapath-id=0000000000000001 \ |
306 | -- add-port br0 p1 -- set interface p1 ofport_request=1 \ | |
307 | -- add-port br0 p2 -- set interface p2 ofport_request=2 \ | |
308 | -- add-port br0 p3 -- set interface p3 ofport_request=3 \ | |
309 | -- add-port br0 p4 -- set interface p4 ofport_request=4 \ | |
310 | -- add-port br0 p5 -- set interface p5 ofport_request=5 \ | |
311 | -- set-controller br0 tcp:127.0.0.1:6653 \ | |
312 | -- set controller br0 connection-mode=out-of-band | |
98dc8dee BP |
313 | |
314 | .. note:: | |
315 | ||
316 | You don't have to run all of these as a single ``ovs-vsctl`` | |
317 | invocation. It is a little more efficient, though, and since it | |
318 | updates the OVS configuration in a single database transaction it | |
319 | means that, for example, there is never a time when the controller | |
320 | is set but it has not yet been configured as out-of-band. | |
321 | ||
322 | Now, if you look at ``inst/faucet.log`` again, you should see that | |
323 | Faucet recognized and configured the new switch and its ports:: | |
324 | ||
dcc3e70b BC |
325 | Jan 06 15:17:10 faucet INFO DPID 1 (0x1) connected |
326 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Cold start configuring DP | |
327 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Configuring VLAN 100 vid:100 ports:Port 1,Port 2,Port 3 | |
328 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Configuring VLAN 200 vid:200 ports:Port 4,Port 5 | |
329 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 1 up, configuring | |
330 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 2 up, configuring | |
331 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 3 up, configuring | |
332 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 4 up, configuring | |
333 | Jan 06 15:17:10 faucet.valve INFO DPID 1 (0x1) Port 5 up, configuring | |
98dc8dee BP |
334 | |
335 | Over on the Open vSwitch side, you can see a lot of related activity | |
336 | if you take a look in ``sandbox/ovs-vswitchd.log``. For example, here | |
337 | is the basic OpenFlow session setup and Faucet's probe of the switch's | |
338 | ports and capabilities:: | |
339 | ||
340 | rconn|INFO|br0<->tcp:127.0.0.1:6653: connecting... | |
341 | vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_HELLO (OF1.4) (xid=0x1): | |
342 | version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05 | |
343 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_HELLO (OF1.3) (xid=0x2f24810a): | |
344 | version bitmap: 0x01, 0x02, 0x03, 0x04 | |
345 | vconn|DBG|tcp:127.0.0.1:6653: negotiated OpenFlow version 0x04 (we support version 0x05 and earlier, peer supports version 0x04 and earlier) | |
346 | rconn|INFO|br0<->tcp:127.0.0.1:6653: connected | |
347 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x2f24810b): 0 bytes of payload | |
348 | vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x2f24810b): 0 bytes of payload | |
349 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FEATURES_REQUEST (OF1.3) (xid=0x2f24810c): | |
350 | vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_FEATURES_REPLY (OF1.3) (xid=0x2f24810c): dpid:0000000000000001 | |
351 | n_tables:254, n_buffers:0 | |
352 | capabilities: FLOW_STATS TABLE_STATS PORT_STATS GROUP_STATS QUEUE_STATS | |
353 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPST_PORT_DESC request (OF1.3) (xid=0x2f24810d): port=ANY | |
354 | vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPST_PORT_DESC reply (OF1.3) (xid=0x2f24810d): | |
355 | 1(p1): addr:aa:55:aa:55:00:14 | |
356 | config: PORT_DOWN | |
357 | state: LINK_DOWN | |
358 | speed: 0 Mbps now, 0 Mbps max | |
359 | 2(p2): addr:aa:55:aa:55:00:15 | |
360 | config: PORT_DOWN | |
361 | state: LINK_DOWN | |
362 | speed: 0 Mbps now, 0 Mbps max | |
363 | 3(p3): addr:aa:55:aa:55:00:16 | |
364 | config: PORT_DOWN | |
365 | state: LINK_DOWN | |
366 | speed: 0 Mbps now, 0 Mbps max | |
367 | 4(p4): addr:aa:55:aa:55:00:17 | |
368 | config: PORT_DOWN | |
369 | state: LINK_DOWN | |
370 | speed: 0 Mbps now, 0 Mbps max | |
371 | 5(p5): addr:aa:55:aa:55:00:18 | |
372 | config: PORT_DOWN | |
373 | state: LINK_DOWN | |
374 | speed: 0 Mbps now, 0 Mbps max | |
375 | LOCAL(br0): addr:c6:64:ff:59:48:41 | |
376 | config: PORT_DOWN | |
377 | state: LINK_DOWN | |
378 | speed: 0 Mbps now, 0 Mbps max | |
379 | ||
380 | After that, you can see Faucet delete all existing flows and then | |
381 | start adding new ones:: | |
382 | ||
383 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f24810e): DEL table:255 priority=0 actions=drop | |
384 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_BARRIER_REQUEST (OF1.3) (xid=0x2f24810f): | |
385 | vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_BARRIER_REPLY (OF1.3) (xid=0x2f24810f): | |
386 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f248110): ADD priority=0 cookie:0x5adc15c0 out_port:0 actions=drop | |
387 | vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f248111): ADD table:1 priority=0 cookie:0x5adc15c0 out_port:0 actions=drop | |
388 | ... | |
389 | ||
390 | OpenFlow Layer | |
391 | ~~~~~~~~~~~~~~ | |
392 | ||
393 | Let's take a look at the OpenFlow tables that Faucet set up. Before | |
394 | we do that, it's helpful to take a look at ``docs/architecture.rst`` | |
395 | in the Faucet documentation to learn how Faucet structures its flow | |
396 | tables. In summary, this document says: | |
397 | ||
398 | Table 0 | |
399 | Port-based ACLs | |
400 | ||
401 | Table 1 | |
402 | Ingress VLAN processing | |
403 | ||
404 | Table 2 | |
405 | VLAN-based ACLs | |
406 | ||
407 | Table 3 | |
408 | Ingress L2 processing, MAC learning | |
409 | ||
410 | Table 4 | |
411 | L3 forwarding for IPv4 | |
412 | ||
413 | Table 5 | |
414 | L3 forwarding for IPv6 | |
415 | ||
416 | Table 6 | |
417 | Virtual IP processing, e.g. for router IP addresses implemented by Faucet | |
418 | ||
419 | Table 7 | |
420 | Egress L2 processing | |
421 | ||
422 | Table 8 | |
423 | Flooding | |
dcc3e70b | 424 | |
98dc8dee BP |
425 | With that in mind, let's dump the flow tables. The simplest way is to |
426 | just run plain ``ovs-ofctl dump-flows``:: | |
427 | ||
428 | $ ovs-ofctl dump-flows br0 | |
429 | ||
430 | If you run that bare command, it produces a lot of extra junk that | |
431 | makes the output harder to read, like statistics and "cookie" values | |
432 | that are all the same. In addition, for historical reasons | |
433 | ``ovs-ofctl`` always defaults to using OpenFlow 1.0 even though Faucet | |
434 | and most modern controllers use OpenFlow 1.3, so it's best to force it | |
435 | to use OpenFlow 1.3. We could throw in a lot of options to fix these, | |
436 | but we'll want to do this more than once, so let's start by defining a | |
437 | shell function for ourselves:: | |
438 | ||
439 | $ dump-flows () { | |
440 | ovs-ofctl -OOpenFlow13 --names --no-stat dump-flows "$@" \ | |
441 | | sed 's/cookie=0x5adc15c0, //' | |
442 | } | |
443 | ||
444 | Let's also define ``save-flows`` and ``diff-flows`` functions for | |
445 | later use:: | |
446 | ||
447 | $ save-flows () { | |
448 | ovs-ofctl -OOpenFlow13 --no-names --sort dump-flows "$@" | |
449 | } | |
450 | $ diff-flows () { | |
451 | ovs-ofctl -OOpenFlow13 diff-flows "$@" | sed 's/cookie=0x5adc15c0 //' | |
452 | } | |
453 | ||
454 | Now let's take a look at the flows we've got and what they mean, like | |
455 | this:: | |
456 | ||
457 | $ dump-flows br0 | |
458 | ||
459 | First, table 0 has a flow that just jumps to table 1 for each | |
460 | configured port, and drops other unrecognized packets. Presumably it | |
461 | will do more if we configured port-based ACLs:: | |
462 | ||
463 | priority=9099,in_port=p1 actions=goto_table:1 | |
464 | priority=9099,in_port=p2 actions=goto_table:1 | |
465 | priority=9099,in_port=p3 actions=goto_table:1 | |
466 | priority=9099,in_port=p4 actions=goto_table:1 | |
467 | priority=9099,in_port=p5 actions=goto_table:1 | |
468 | priority=0 actions=drop | |
469 | ||
470 | Table 1, for ingress VLAN processing, has a bunch of flows that drop | |
dcc3e70b | 471 | inappropriate packets, such as LLDP and STP:: |
98dc8dee | 472 | |
98dc8dee BP |
473 | table=1, priority=9099,dl_dst=01:80:c2:00:00:00 actions=drop |
474 | table=1, priority=9099,dl_dst=01:00:0c:cc:cc:cd actions=drop | |
475 | table=1, priority=9099,dl_type=0x88cc actions=drop | |
476 | ||
477 | Table 1 also has some more interesting flows that recognize packets | |
478 | without a VLAN header on each of our ports | |
479 | (``vlan_tci=0x0000/0x1fff``), push on the VLAN configured for the | |
480 | port, and proceed to table 3. Presumably these skip table 2 because | |
481 | we did not configure any VLAN-based ACLs. There is also a fallback | |
482 | flow to drop other packets, which in practice means that if any | |
483 | received packet already has a VLAN header then it will be dropped:: | |
484 | ||
485 | table=1, priority=9000,in_port=p1,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3 | |
486 | table=1, priority=9000,in_port=p2,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3 | |
487 | table=1, priority=9000,in_port=p3,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3 | |
488 | table=1, priority=9000,in_port=p4,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4296->vlan_vid,goto_table:3 | |
489 | table=1, priority=9000,in_port=p5,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4296->vlan_vid,goto_table:3 | |
490 | table=1, priority=0 actions=drop | |
491 | ||
492 | .. note:: | |
493 | ||
494 | The syntax ``set_field:4196->vlan_vid`` is curious and somewhat | |
495 | misleading. OpenFlow 1.3 defines the ``vlan_vid`` field as a 13-bit | |
496 | field where bit 12 is set to 1 if the VLAN header is present. Thus, | |
497 | since 4196 is 0x1064, this action sets VLAN value 0x64, which in | |
498 | decimal is 100. | |
499 | ||
500 | Table 2 isn't used because there are no VLAN-based ACLs. It just has | |
501 | a drop flow:: | |
502 | ||
503 | table=2, priority=0 actions=drop | |
504 | ||
505 | Table 3 is used for MAC learning but the controller hasn't learned any | |
dcc3e70b BC |
506 | MAC yet. It also drops some inappropriate packets such as those that claim |
507 | to be from a broadcast source address (why not from all multicast source | |
508 | addresses, though?). We'll come back here later:: | |
98dc8dee | 509 | |
dcc3e70b BC |
510 | table=3, priority=9099,dl_src=ff:ff:ff:ff:ff:ff actions=drop |
511 | table=3, priority=9001,dl_src=0e:00:00:00:00:01 actions=drop | |
98dc8dee BP |
512 | table=3, priority=0 actions=drop |
513 | table=3, priority=9000 actions=CONTROLLER:96,goto_table:7 | |
514 | ||
515 | Tables 4, 5, and 6 aren't used because we haven't configured any | |
516 | routing:: | |
517 | ||
518 | table=4, priority=0 actions=drop | |
519 | table=5, priority=0 actions=drop | |
520 | table=6, priority=0 actions=drop | |
521 | ||
522 | Table 7 is used to direct packets to learned MACs but Faucet hasn't | |
523 | learned any MACs yet, so it just sends all the packets along to table | |
524 | 8:: | |
525 | ||
526 | table=7, priority=0 actions=drop | |
527 | table=7, priority=9000 actions=goto_table:8 | |
528 | ||
529 | Table 8 implements flooding, broadcast, and multicast. The flows for | |
530 | broadcast and flood are easy to understand: if the packet came in on a | |
531 | given port and needs to be flooded or broadcast, output it to all the | |
532 | other ports in the same VLAN:: | |
533 | ||
534 | table=8, priority=9008,in_port=p1,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p2,output:p3 | |
535 | table=8, priority=9008,in_port=p2,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p3 | |
536 | table=8, priority=9008,in_port=p3,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p2 | |
537 | table=8, priority=9008,in_port=p4,dl_vlan=200,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p5 | |
538 | table=8, priority=9008,in_port=p5,dl_vlan=200,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p4 | |
539 | table=8, priority=9000,in_port=p1,dl_vlan=100 actions=pop_vlan,output:p2,output:p3 | |
540 | table=8, priority=9000,in_port=p2,dl_vlan=100 actions=pop_vlan,output:p1,output:p3 | |
541 | table=8, priority=9000,in_port=p3,dl_vlan=100 actions=pop_vlan,output:p1,output:p2 | |
542 | table=8, priority=9000,in_port=p4,dl_vlan=200 actions=pop_vlan,output:p5 | |
543 | table=8, priority=9000,in_port=p5,dl_vlan=200 actions=pop_vlan,output:p4 | |
544 | ||
545 | .. note:: | |
546 | ||
547 | These flows could apparently be simpler because OpenFlow says that | |
548 | ``output:<port>`` is ignored if ``<port>`` is the input port. That | |
549 | means that the first three flows above could apparently be collapsed | |
550 | into just:: | |
551 | ||
552 | table=8, priority=9008,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p2,output:p3 | |
553 | ||
554 | There might be some reason why this won't work or isn't practical, | |
555 | but that isn't obvious from looking at the flow table. | |
556 | ||
557 | There are also some flows for handling some standard forms of | |
558 | multicast, and a fallback drop flow:: | |
559 | ||
560 | table=8, priority=9006,in_port=p1,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p2,output:p3 | |
561 | table=8, priority=9006,in_port=p2,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p1,output:p3 | |
562 | table=8, priority=9006,in_port=p3,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p1,output:p2 | |
563 | table=8, priority=9006,in_port=p4,dl_vlan=200,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p5 | |
564 | table=8, priority=9006,in_port=p5,dl_vlan=200,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p4 | |
565 | table=8, priority=9002,in_port=p1,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p2,output:p3 | |
566 | table=8, priority=9002,in_port=p2,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p3 | |
567 | table=8, priority=9002,in_port=p3,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p2 | |
568 | table=8, priority=9004,in_port=p1,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p2,output:p3 | |
569 | table=8, priority=9004,in_port=p2,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p3 | |
570 | table=8, priority=9004,in_port=p3,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p2 | |
571 | table=8, priority=9002,in_port=p4,dl_vlan=200,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p5 | |
572 | table=8, priority=9002,in_port=p5,dl_vlan=200,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p4 | |
573 | table=8, priority=9004,in_port=p4,dl_vlan=200,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p5 | |
574 | table=8, priority=9004,in_port=p5,dl_vlan=200,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p4 | |
575 | table=8, priority=0 actions=drop | |
576 | ||
577 | Tracing | |
578 | ~~~~~~~ | |
579 | ||
580 | Let's go a level deeper. So far, everything we've done has been | |
581 | fairly general. We can also look at something more specific: the path | |
582 | that a particular packet would take through Open vSwitch. We can use | |
583 | OVN ``ofproto/trace`` command to play "what-if?" games. This command | |
584 | is one that we send directly to ``ovs-vswitchd``, using the | |
585 | ``ovs-appctl`` utility. | |
586 | ||
587 | .. note:: | |
588 | ||
589 | ``ovs-appctl`` is actually a very simple-minded JSON-RPC client, so you could | |
590 | also use some other utility that speaks JSON-RPC, or access it from a program | |
591 | as an API. | |
592 | ||
593 | The ``ovs-vswitchd``\(8) manpage has a lot of detail on how to use | |
594 | ``ofproto/trace``, but let's just start by building up from a simple | |
595 | example. You can start with a command that just specifies the | |
596 | datapath (e.g. ``br0``), an input port, and nothing else; unspecified | |
597 | fields default to all-zeros. Let's look at the full output for this | |
598 | trivial example:: | |
599 | ||
600 | $ ovs-appctl ofproto/trace br0 in_port=p1 | |
601 | Flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 | |
602 | ||
603 | bridge("br0") | |
604 | ------------- | |
605 | 0. in_port=1, priority 9099, cookie 0x5adc15c0 | |
606 | goto_table:1 | |
607 | 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
608 | push_vlan:0x8100 | |
609 | set_field:4196->vlan_vid | |
610 | goto_table:3 | |
611 | 3. priority 9000, cookie 0x5adc15c0 | |
612 | CONTROLLER:96 | |
613 | goto_table:7 | |
614 | 7. priority 9000, cookie 0x5adc15c0 | |
615 | goto_table:8 | |
616 | 8. in_port=1,dl_vlan=100, priority 9000, cookie 0x5adc15c0 | |
617 | pop_vlan | |
618 | output:2 | |
619 | output:3 | |
620 | ||
621 | Final flow: unchanged | |
622 | Megaflow: recirc_id=0,eth,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 | |
d39ec23d | 623 | Datapath actions: push_vlan(vid=100,pcp=0),userspace(pid=0,controller(reason=1,flags=1,recirc_id=1,rule_cookie=0x5adc15c0,controller_id=0,max_len=96)),pop_vlan,2,3 |
98dc8dee BP |
624 | |
625 | The first line of output, beginning with ``Flow:``, just repeats our | |
626 | request in a more verbose form, including the L2 fields that were | |
627 | zeroed. | |
628 | ||
629 | Each of the numbered items under ``bridge("br0")`` shows what would | |
630 | happen to our hypothetical packet in the table with the given number. | |
631 | For example, we see in table 1 that the packet matches a flow that | |
632 | push on a VLAN header, set the VLAN ID to 100, and goes on to further | |
633 | processing in table 3. In table 3, the packet gets sent to the | |
634 | controller to allow MAC learning to take place, and then table 8 | |
635 | floods the packet to the other ports in the same VLAN. | |
636 | ||
637 | Summary information follows the numbered tables. The packet hasn't | |
638 | been changed (overall, even though a VLAN was pushed and then popped | |
639 | back off) since ingress, hence ``Final flow: unchanged``. We'll look | |
640 | at the ``Megaflow`` information later. The ``Datapath actions`` | |
d39ec23d | 641 | summarize what would actually happen to such a packet. |
98dc8dee BP |
642 | |
643 | Triggering MAC Learning | |
644 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
645 | ||
646 | We just saw how a packet gets sent to the controller to trigger MAC | |
647 | learning. Let's actually send the packet and see what happens. But | |
648 | before we do that, let's save a copy of the current flow tables for | |
649 | later comparison:: | |
650 | ||
651 | $ save-flows br0 > flows1 | |
652 | ||
653 | Now use ``ofproto/trace``, as before, with a few new twists: we | |
654 | specify the source and destination Ethernet addresses and append the | |
655 | ``-generate`` option so that side effects like sending a packet to the | |
656 | controller actually happen:: | |
657 | ||
658 | $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:11:11:00:00:00,dl_dst=00:22:22:00:00:00 -generate | |
659 | ||
660 | The output is almost identical to that before, so it is not repeated | |
661 | here. But, take a look at ``inst/faucet.log`` now. It should now | |
662 | include a line at the end that says that it learned about our MAC | |
663 | 00:11:11:00:00:00, like this:: | |
664 | ||
dcc3e70b | 665 | Jan 06 15:56:02 faucet.valve INFO DPID 1 (0x1) L2 learned 00:11:11:00:00:00 (L2 type 0x0000, L3 src None) on Port 1 on VLAN 100 (1 hosts total |
98dc8dee BP |
666 | |
667 | Now compare the flow tables that we saved to the current ones:: | |
668 | ||
669 | diff-flows flows1 br0 | |
670 | ||
671 | The result should look like this, showing new flows for the learned | |
672 | MACs:: | |
673 | ||
dcc3e70b BC |
674 | +table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:11:11:00:00:00 hard_timeout=3601 actions=goto_table:7 |
675 | +table=7 priority=9099,dl_vlan=100,dl_dst=00:11:11:00:00:00 idle_timeout=3601 actions=pop_vlan,output:1 | |
98dc8dee BP |
676 | |
677 | To demonstrate the usefulness of the learned MAC, try tracing (with | |
678 | side effects) a packet arriving on ``p2`` (or ``p3``) and destined to | |
679 | the address learned on ``p1``, like this:: | |
680 | ||
681 | $ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate | |
682 | ||
683 | The first time you run this command, you will notice that it sends the | |
684 | packet to the controller, to learn ``p2``'s 00:22:22:00:00:00 source | |
685 | address:: | |
686 | ||
687 | bridge("br0") | |
688 | ------------- | |
689 | 0. in_port=2, priority 9099, cookie 0x5adc15c0 | |
690 | goto_table:1 | |
691 | 1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
692 | push_vlan:0x8100 | |
693 | set_field:4196->vlan_vid | |
694 | goto_table:3 | |
695 | 3. priority 9000, cookie 0x5adc15c0 | |
696 | CONTROLLER:96 | |
697 | goto_table:7 | |
698 | 7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0 | |
699 | pop_vlan | |
700 | output:1 | |
701 | ||
702 | If you check ``inst/faucet.log``, you can see that ``p2``'s MAC has | |
703 | been learned too:: | |
704 | ||
dcc3e70b | 705 | Jan 06 15:58:09 faucet.valve INFO DPID 1 (0x1) L2 learned 00:22:22:00:00:00 (L2 type 0x0000, L3 src None) on Port 2 on VLAN 100 (2 hosts total) |
98dc8dee BP |
706 | |
707 | Similarly for ``diff-flows``:: | |
708 | ||
709 | $ diff-flows flows1 br0 | |
dcc3e70b BC |
710 | +table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:11:11:00:00:00 hard_timeout=3601 actions=goto_table:7 |
711 | +table=3 priority=9098,in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00 hard_timeout=3604 actions=goto_table:7 | |
712 | +table=7 priority=9099,dl_vlan=100,dl_dst=00:11:11:00:00:00 idle_timeout=3601 actions=pop_vlan,output:1 | |
713 | +table=7 priority=9099,dl_vlan=100,dl_dst=00:22:22:00:00:00 idle_timeout=3604 actions=pop_vlan,output:2 | |
98dc8dee BP |
714 | |
715 | Then, if you re-run either of the ``ofproto/trace`` commands (with or | |
716 | without ``-generate``), you can see that the packets go back and forth | |
717 | without any further MAC learning, e.g.:: | |
718 | ||
719 | $ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate | |
720 | Flow: in_port=2,vlan_tci=0x0000,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 | |
721 | ||
722 | bridge("br0") | |
723 | ------------- | |
724 | 0. in_port=2, priority 9099, cookie 0x5adc15c0 | |
725 | goto_table:1 | |
726 | 1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
727 | push_vlan:0x8100 | |
728 | set_field:4196->vlan_vid | |
729 | goto_table:3 | |
730 | 3. in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00, priority 9098, cookie 0x5adc15c0 | |
731 | goto_table:7 | |
732 | 7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0 | |
733 | pop_vlan | |
734 | output:1 | |
735 | ||
736 | Final flow: unchanged | |
737 | Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 | |
dcc3e70b | 738 | Datapath actions: 1 |
98dc8dee BP |
739 | |
740 | Performance | |
741 | ~~~~~~~~~~~ | |
742 | ||
d39ec23d JP |
743 | Open vSwitch has a concept of a "fast path" and a "slow path"; ideally |
744 | all packets stay in the fast path. This distinction between slow path | |
745 | and fast path is the key to making sure that Open vSwitch performs as | |
746 | fast as possible. | |
747 | ||
748 | Some factors can force a flow or a packet to take the slow path. As one | |
749 | example, all CFM, BFD, LACP, STP, and LLDP processing takes place in the | |
750 | slow path, in the cases where Open vSwitch processes these protocols | |
751 | itself instead of delegating to controller-written flows. As a second | |
98dc8dee BP |
752 | example, any flow that modifies ARP fields is processed in the slow |
753 | path. These are corner cases that are unlikely to cause performance | |
754 | problems in practice because these protocols send packets at a | |
755 | relatively slow rate, and users and controller authors do not normally | |
756 | need to be concerned about them. | |
757 | ||
758 | To understand what cases users and controller authors should consider, | |
759 | we need to talk about how Open vSwitch optimizes for performance. The | |
760 | Open vSwitch code is divided into two major components which, as | |
761 | already mentioned, are called the "slow path" and "fast path" (aka | |
762 | "datapath"). The slow path is embedded in the ``ovs-vswitchd`` | |
763 | userspace program. It is the part of the Open vSwitch packet | |
764 | processing logic that understands OpenFlow. Its job is to take a | |
765 | packet and run it through the OpenFlow tables to determine what should | |
766 | happen to it. It outputs a list of actions in a form similar to | |
767 | OpenFlow actions but simpler, called "ODP actions" or "datapath | |
768 | actions". It then passes the ODP actions to the datapath, which | |
769 | applies them to the packet. | |
770 | ||
771 | .. note:: | |
772 | ||
773 | Open vSwitch contains a single slow path and multiple fast paths. | |
774 | The difference between using Open vSwitch with the Linux kernel | |
775 | versus with DPDK is the datapath. | |
776 | ||
777 | If every packet passed through the slow path and the fast path in this | |
778 | way, performance would be terrible. The key to getting high | |
779 | performance from this architecture is caching. Open vSwitch includes | |
780 | a multi-level cache. It works like this: | |
781 | ||
782 | 1. A packet initially arrives at the datapath. Some datapaths (such | |
783 | as DPDK and the in-tree version of the OVS kernel module) have a | |
784 | first-level cache called the "microflow cache". The microflow | |
785 | cache is the key to performance for relatively long-lived, high | |
786 | packet rate flows. If the datapath has a microflow cache, then it | |
787 | consults it and, if there is a cache hit, the datapath executes the | |
788 | associated actions. Otherwise, it proceeds to step 2. | |
789 | ||
790 | 2. The datapath consults its second-level cache, called the "megaflow | |
791 | cache". The megaflow cache is the key to performance for shorter | |
792 | or low packet rate flows. If there is a megaflow cache hit, the | |
793 | datapath executes the associated actions. Otherwise, it proceeds | |
794 | to step 3. | |
795 | ||
796 | 3. The datapath passes the packet to the slow path, which runs it | |
797 | through the OpenFlow table to yield ODP actions, a process that is | |
798 | often called "flow translation". It then passes the packet back to | |
799 | the datapath to execute the actions and to, if possible, install a | |
800 | megaflow cache entry so that subsequent similar packets can be | |
801 | handled directly by the fast path. (We already described above | |
802 | most of the cases where a cache entry cannot be installed.) | |
803 | ||
804 | The megaflow cache is the key cache to consider for performance | |
805 | tuning. Open vSwitch provides tools for understanding and optimizing | |
806 | its behavior. The ``ofproto/trace`` command that we have already been | |
807 | using is the most common tool for this use. Let's take another look | |
808 | at the most recent ``ofproto/trace`` output:: | |
809 | ||
810 | $ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate | |
811 | Flow: in_port=2,vlan_tci=0x0000,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 | |
812 | ||
813 | bridge("br0") | |
814 | ------------- | |
815 | 0. in_port=2, priority 9099, cookie 0x5adc15c0 | |
816 | goto_table:1 | |
817 | 1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
818 | push_vlan:0x8100 | |
819 | set_field:4196->vlan_vid | |
820 | goto_table:3 | |
821 | 3. in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00, priority 9098, cookie 0x5adc15c0 | |
822 | goto_table:7 | |
823 | 7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0 | |
824 | pop_vlan | |
825 | output:1 | |
826 | ||
827 | Final flow: unchanged | |
828 | Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 | |
dcc3e70b | 829 | Datapath actions: 1 |
98dc8dee BP |
830 | |
831 | This time, it's the last line that we're interested in. This line | |
832 | shows the entry that Open vSwitch would insert into the megaflow cache | |
833 | given the particular packet with the current flow tables. The | |
834 | megaflow entry includes: | |
835 | ||
836 | * ``recirc_id``. This is an implementation detail that users don't | |
837 | normally need to understand. | |
838 | ||
839 | * ``eth``. This just indicates that the cache entry matches only | |
840 | Ethernet packets; Open vSwitch also supports other types of packets, | |
841 | such as IP packets not encapsulated in Ethernet. | |
842 | ||
843 | * All of the fields matched by any of the flows that the packet | |
844 | visited: | |
845 | ||
846 | ``in_port`` | |
847 | In tables 0, 1, and 3. | |
848 | ||
849 | ``vlan_tci`` | |
850 | In tables 1, 3, and 7 (``vlan_tci`` includes the VLAN ID and PCP | |
851 | fields and``dl_vlan`` is just the VLAN ID). | |
852 | ||
853 | ``dl_src`` | |
854 | In table 3 | |
855 | ||
856 | ``dl_dst`` | |
857 | In table 7. | |
858 | ||
859 | * All of the fields matched by flows that had to be ruled out to | |
860 | ensure that the ones that actually matched were the highest priority | |
861 | matching rules. | |
862 | ||
863 | The last one is important. Notice how the megaflow matches on | |
864 | ``dl_type=0x0000``, even though none of the tables matched on | |
865 | ``dl_type`` (the Ethernet type). One reason is because of this flow | |
866 | in OpenFlow table 1 (which shows up in ``dump-flows`` output):: | |
867 | ||
868 | table=1, priority=9099,dl_type=0x88cc actions=drop | |
869 | ||
870 | This flow has higher priority than the flow in table 1 that actually | |
871 | matched. This means that, to put it in the megaflow cache, | |
872 | ``ovs-vswitchd`` has to add a match on ``dl_type`` to ensure that the | |
873 | cache entry doesn't match LLDP packets (with Ethertype 0x88cc). | |
874 | ||
875 | .. note:: | |
876 | ||
877 | In fact, in some cases ``ovs-vswitchd`` matches on fields that | |
878 | aren't strictly required according to this description. ``dl_type`` | |
879 | is actually one of those, so deleting the LLDP flow probably would | |
880 | not have any effect on the megaflow. But the principle here is | |
881 | sound. | |
882 | ||
883 | So why does any of this matter? It's because, the more specific a | |
884 | megaflow is, that is, the more fields or bits within fields that a | |
885 | megaflow matches, the less valuable it is from a caching viewpoint. A | |
886 | very specific megaflow might match on L2 and L3 addresses and L4 port | |
887 | numbers. When that happens, only packets in one (half-)connection | |
888 | match the megaflow. If that connection has only a few packets, as | |
889 | many connections do, then the high cost of the slow path translation | |
890 | is amortized over only a few packets, so the average cost of | |
891 | forwarding those packets is high. On the other hand, if a megaflow | |
892 | only matches a relatively small number of L2 and L3 packets, then the | |
893 | cache entry can potentially be used by many individual connections, | |
894 | and the average cost is low. | |
895 | ||
896 | For more information on how Open vSwitch constructs megaflows, | |
897 | including about ways that it can make megaflow entries less specific | |
898 | than one would infer from the discussion here, please refer to the | |
899 | 2015 NSDI paper, "The Design and Implementation of Open vSwitch", | |
900 | which focuses on this algorithm. | |
901 | ||
902 | Routing | |
903 | ------- | |
904 | ||
905 | We've looked at how Faucet implements switching in OpenFlow, and how | |
906 | Open vSwitch implements OpenFlow through its datapath architecture. | |
907 | Now let's start over, adding L3 routing into the picture. | |
908 | ||
909 | It's remarkably easy to enable routing. We just change our ``vlans`` | |
910 | section in ``inst/faucet.yaml`` to specify a router IP address for | |
1fb924b8 | 911 | each VLAN and define a router between them. The ``dps`` section is unchanged:: |
98dc8dee BP |
912 | |
913 | dps: | |
914 | switch-1: | |
915 | dp_id: 0x1 | |
916 | timeout: 3600 | |
917 | arp_neighbor_timeout: 3600 | |
918 | interfaces: | |
919 | 1: | |
920 | native_vlan: 100 | |
921 | 2: | |
922 | native_vlan: 100 | |
923 | 3: | |
924 | native_vlan: 100 | |
925 | 4: | |
926 | native_vlan: 200 | |
927 | 5: | |
928 | native_vlan: 200 | |
929 | vlans: | |
930 | 100: | |
931 | faucet_vips: ["10.100.0.254/24"] | |
932 | 200: | |
933 | faucet_vips: ["10.200.0.254/24"] | |
934 | routers: | |
935 | router-1: | |
936 | vlans: [100, 200] | |
937 | ||
938 | Then we restart Faucet:: | |
939 | ||
940 | $ docker restart faucet | |
941 | ||
942 | .. note:: | |
943 | ||
944 | One should be able to tell Faucet to re-read its configuration file | |
945 | without restarting it. I sometimes saw anomalous behavior when I | |
946 | did this, although I didn't characterize it well enough to make a | |
947 | quality bug report. I found restarting the container to be | |
948 | reliable. | |
949 | ||
950 | OpenFlow Layer | |
951 | ~~~~~~~~~~~~~~ | |
952 | ||
953 | Back in the OVS sandbox, let's see how the flow table has changed, with:: | |
954 | ||
955 | $ diff-flows flows1 br0 | |
956 | ||
957 | First, table 3 has new flows to direct ARP packets to table 6 (the | |
958 | virtual IP processing table), presumably to handle ARP for the router | |
959 | IPs. New flows also send IP packets destined to a particular Ethernet | |
960 | address to table 4 (the L3 forwarding table); we can make the educated | |
961 | guess that the Ethernet address is the one used by the Faucet router:: | |
962 | ||
dcc3e70b BC |
963 | +table=3 priority=9131,arp,dl_vlan=100 actions=goto_table:6 |
964 | +table=3 priority=9131,arp,dl_vlan=200 actions=goto_table:6 | |
965 | +table=3 priority=9099,ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01 actions=goto_table:4 | |
966 | +table=3 priority=9099,ip,dl_vlan=200,dl_dst=0e:00:00:00:00:01 actions=goto_table:4 | |
98dc8dee BP |
967 | |
968 | The new flows in table 4 appear to be verifying that the packets are | |
969 | indeed addressed to a network or IP address that Faucet knows how to | |
970 | route:: | |
971 | ||
dcc3e70b BC |
972 | +table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.100.0.254 actions=goto_table:6 |
973 | +table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.200.0.254 actions=goto_table:6 | |
974 | +table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.100.0.0/24 actions=goto_table:6 | |
975 | +table=4 priority=9123,ip,dl_vlan=200,nw_dst=10.100.0.0/24 actions=goto_table:6 | |
976 | +table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.200.0.0/24 actions=goto_table:6 | |
977 | +table=4 priority=9123,ip,dl_vlan=200,nw_dst=10.200.0.0/24 actions=goto_table:6 | |
98dc8dee BP |
978 | |
979 | Table 6 has a few different things going on. It sends ARP requests | |
980 | for the router IPs to the controller; presumably the controller will | |
981 | generate replies and send them back to the requester. It switches | |
982 | other ARP packets, either broadcasting them if they have a broadcast | |
983 | destination or attempting to unicast them otherwise. It sends all | |
984 | other IP packets to the controller:: | |
985 | ||
dcc3e70b BC |
986 | +table=6 priority=9133,arp,arp_tpa=10.100.0.254 actions=CONTROLLER:128 |
987 | +table=6 priority=9133,arp,arp_tpa=10.200.0.254 actions=CONTROLLER:128 | |
988 | +table=6 priority=9132,arp,dl_dst=ff:ff:ff:ff:ff:ff actions=goto_table:8 | |
989 | +table=6 priority=9131,arp actions=goto_table:7 | |
990 | +table=6 priority=9130,ip actions=CONTROLLER:128 | |
98dc8dee BP |
991 | |
992 | Performance is clearly going to be poor if every packet that needs to | |
993 | be routed has to go to the controller, but it's unlikely that's the | |
994 | full story. In the next section, we'll take a closer look. | |
995 | ||
996 | Tracing | |
997 | ~~~~~~~ | |
998 | ||
999 | As in our switching example, we can play some "what-if?" games to | |
1000 | figure out how this works. Let's suppose that a machine with IP | |
1001 | 10.100.0.1, on port ``p1``, wants to send a IP packet to a machine | |
1002 | with IP 10.200.0.1 on port ``p4``. Assuming that these hosts have not | |
1003 | been in communication recently, the steps to accomplish this are | |
1004 | normally the following: | |
1005 | ||
1006 | 1. Host 10.100.0.1 sends an ARP request to router 10.100.0.254. | |
1007 | ||
1008 | 2. The router sends an ARP reply to the host. | |
1009 | ||
1010 | 3. Host 10.100.0.1 sends an IP packet to 10.200.0.1, via the router's | |
1011 | Ethernet address. | |
1012 | ||
1013 | 4. The router broadcasts an ARP request to ``p4`` and ``p5``, the | |
1014 | ports that carry the 10.200.0.<x> network. | |
1015 | ||
1016 | 5. Host 10.200.0.1 sends an ARP reply to the router. | |
1017 | ||
1018 | 6. Either the router sends the IP packet (which it buffered) to | |
1019 | 10.200.0.1, or eventually 10.100.0.1 times out and resends it. | |
1020 | ||
1021 | Let's use ``ofproto/trace`` to see whether Faucet and OVS follow this | |
1022 | procedure. | |
1023 | ||
1024 | Before we start, save a new snapshot of the flow tables for later | |
1025 | comparison:: | |
1026 | ||
1027 | $ save-flows br0 > flows2 | |
1028 | ||
1029 | Step 1: Host ARP for Router | |
1030 | +++++++++++++++++++++++++++ | |
1031 | ||
1032 | Let's simulate the ARP from 10.100.0.1 to its gateway router | |
1033 | 10.100.0.254. This requires more detail than any of the packets we've | |
1034 | simulated previously:: | |
1035 | ||
1036 | $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x806,arp_spa=10.100.0.1,arp_tpa=10.100.0.254,arp_sha=00:01:02:03:04:05,arp_tha=ff:ff:ff:ff:ff:ff,arp_op=1 -generate | |
1037 | ||
1038 | The important part of the output is where it shows that the packet was | |
1039 | recognized as an ARP request destined to the router gateway and | |
1040 | therefore sent to the controller:: | |
1041 | ||
1042 | 6. arp,arp_tpa=10.100.0.254, priority 9133, cookie 0x5adc15c0 | |
dcc3e70b | 1043 | CONTROLLER:128 |
98dc8dee BP |
1044 | |
1045 | The Faucet log shows that Faucet learned the host's MAC address, | |
1046 | its MAC-to-IP mapping, and responded to the ARP request:: | |
1047 | ||
dcc3e70b BC |
1048 | Jan 06 16:12:23 faucet.valve INFO DPID 1 (0x1) Adding new route 10.100.0.1/32 via 10.100.0.1 (00:01:02:03:04:05) on VLAN 100 |
1049 | Jan 06 16:12:23 faucet.valve INFO DPID 1 (0x1) Responded to ARP request for 10.100.0.254 from 10.100.0.1 (00:01:02:03:04:05) on VLAN 100 | |
1050 | Jan 06 16:12:23 faucet.valve INFO DPID 1 (0x1) L2 learned 00:01:02:03:04:05 (L2 type 0x0806, L3 src 10.100.0.1) on Port 1 on VLAN 100 (1 hosts total) | |
98dc8dee BP |
1051 | |
1052 | We can also look at the changes to the flow tables:: | |
1053 | ||
1054 | $ diff-flows flows2 br0 | |
1055 | +table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:01:02:03:04:05 hard_timeout=3600 actions=goto_table:7 | |
1056 | +table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.100.0.1 actions=set_field:4196->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:01:02:03:04:05->eth_dst,dec_ttl,goto_table:7 | |
1057 | +table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.100.0.1 actions=set_field:4196->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:01:02:03:04:05->eth_dst,dec_ttl,goto_table:7 | |
1058 | +table=7 priority=9099,dl_vlan=100,dl_dst=00:01:02:03:04:05 idle_timeout=3600 actions=pop_vlan,output:1 | |
1059 | ||
1060 | The new flows include one in table 3 and one in table 7 for the | |
1061 | learned MAC, which have the same forms we saw before. The new flows | |
1062 | in table 4 are different. They matches packets directed to 10.100.0.1 | |
1063 | (in two VLANs) and forward them to the host by updating the Ethernet | |
1064 | source and destination addresses appropriately, decrementing the TTL, | |
1065 | and skipping ahead to unicast output in table 7. This means that | |
1066 | packets sent **to** 10.100.0.1 should now get to their destination. | |
1067 | ||
1068 | Step 2: Router Sends ARP Reply | |
1069 | ++++++++++++++++++++++++++++++ | |
1070 | ||
1071 | ``inst/faucet.log`` said that the router sent an ARP reply. How can | |
1072 | we see it? Simulated packets just get dropped by default. One way is | |
1073 | to configure the dummy ports to write the packets they receive to a | |
1074 | file. Let's try that. First configure the port:: | |
1075 | ||
1076 | $ ovs-vsctl set interface p1 options:pcap=p1.pcap | |
1077 | ||
1078 | Then re-run the "trace" command:: | |
1079 | ||
1080 | $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x806,arp_spa=10.100.0.1,arp_tpa=10.100.0.254,arp_sha=00:01:02:03:04:05,arp_tha=ff:ff:ff:ff:ff:ff,arp_op=1 -generate | |
1081 | ||
1082 | And dump the reply packet:: | |
1083 | ||
1084 | $ /usr/sbin/tcpdump -evvvr sandbox/p1.pcap | |
dcc3e70b BC |
1085 | reading from file sandbox/p1.pcap, link-type EN10MB (Ethernet) |
1086 | 16:14:47.670727 0e:00:00:00:00:01 (oui Unknown) > 00:01:02:03:04:05 (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.100.0.254 is-at 0e:00:00:00:00:01 (oui Unknown), length 46 | |
98dc8dee BP |
1087 | |
1088 | We clearly see the ARP reply, which tells us that the Faucet router's | |
1089 | Ethernet address is 0e:00:00:00:00:01 (as we guessed before from the | |
1090 | flow table. | |
1091 | ||
1092 | Let's configure the rest of our ports to log their packets, too:: | |
1093 | ||
1094 | $ for i in 2 3 4 5; do ovs-vsctl set interface p$i options:pcap=p$i.pcap; done | |
1095 | ||
1096 | Step 3: Host Sends IP Packet | |
1097 | ++++++++++++++++++++++++++++ | |
1098 | ||
1099 | Now that host 10.100.0.1 has the MAC address for its router, it can | |
1100 | send an IP packet to 10.200.0.1 via the router's MAC address, like | |
1101 | this:: | |
1102 | ||
1103 | $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,udp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64 -generate | |
dcc3e70b | 1104 | Flow: udp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0 |
98dc8dee BP |
1105 | |
1106 | bridge("br0") | |
1107 | ------------- | |
1108 | 0. in_port=1, priority 9099, cookie 0x5adc15c0 | |
1109 | goto_table:1 | |
1110 | 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
1111 | push_vlan:0x8100 | |
1112 | set_field:4196->vlan_vid | |
1113 | goto_table:3 | |
1114 | 3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0 | |
1115 | goto_table:4 | |
1116 | 4. ip,dl_vlan=100,nw_dst=10.200.0.0/24, priority 9123, cookie 0x5adc15c0 | |
1117 | goto_table:6 | |
dcc3e70b BC |
1118 | 6. ip, priority 9130, cookie 0x5adc15c0 |
1119 | CONTROLLER:128 | |
98dc8dee | 1120 | |
dcc3e70b BC |
1121 | Final flow: udp,in_port=1,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0 |
1122 | Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.0/25,nw_frag=no | |
d39ec23d | 1123 | Datapath actions: push_vlan(vid=100,pcp=0),userspace(pid=0,controller(reason=1,flags=0,recirc_id=6,rule_cookie=0x5adc15c0,controller_id=0,max_len=128)) |
98dc8dee BP |
1124 | |
1125 | Observe that the packet gets recognized as destined to the router, in | |
1126 | table 3, and then as properly destined to the 10.200.0.0/24 network, | |
1127 | in table 4. In table 6, however, it gets sent to the controller. | |
1128 | Presumably, this is because Faucet has not yet resolved an Ethernet | |
1129 | address for the destination host 10.200.0.1. It probably sent out an | |
1130 | ARP request. Let's take a look in the next step. | |
1131 | ||
1132 | Step 4: Router Broadcasts ARP Request | |
1133 | +++++++++++++++++++++++++++++++++++++ | |
1134 | ||
1135 | The router needs to know the Ethernet address of 10.200.0.1. It knows | |
1136 | that, if this machine exists, it's on port ``p4`` or ``p5``, since we | |
1137 | configured those ports as VLAN 200. | |
1138 | ||
1139 | Let's make sure:: | |
1140 | ||
1141 | $ /usr/sbin/tcpdump -evvvr sandbox/p4.pcap | |
1142 | reading from file sandbox/p4.pcap, link-type EN10MB (Ethernet) | |
dcc3e70b | 1143 | 16:17:43.174006 0e:00:00:00:00:01 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.200.0.1 tell 10.200.0.254, length 46 |
98dc8dee BP |
1144 | |
1145 | and:: | |
1146 | ||
1147 | $ /usr/sbin/tcpdump -evvvr sandbox/p5.pcap | |
1148 | reading from file sandbox/p5.pcap, link-type EN10MB (Ethernet) | |
dcc3e70b | 1149 | 16:17:43.174268 0e:00:00:00:00:01 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.200.0.1 tell 10.200.0.254, length 46 |
98dc8dee BP |
1150 | |
1151 | For good measure, let's make sure that it wasn't sent to ``p3``:: | |
1152 | ||
1153 | $ /usr/sbin/tcpdump -evvvr sandbox/p3.pcap | |
1154 | reading from file sandbox/p3.pcap, link-type EN10MB (Ethernet) | |
1155 | ||
1156 | Step 5: Host 2 Sends ARP Reply | |
1157 | ++++++++++++++++++++++++++++++ | |
1158 | ||
1159 | The Faucet controller sent an ARP request, so we can send an ARP | |
1160 | reply:: | |
1161 | ||
1162 | $ ovs-appctl ofproto/trace br0 in_port=p4,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,dl_type=0x806,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01,arp_op=2 -generate | |
1163 | Flow: arp,in_port=4,vlan_tci=0x0000,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_op=2,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01 | |
1164 | ||
1165 | bridge("br0") | |
1166 | ------------- | |
1167 | 0. in_port=4, priority 9099, cookie 0x5adc15c0 | |
1168 | goto_table:1 | |
1169 | 1. in_port=4,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
1170 | push_vlan:0x8100 | |
1171 | set_field:4296->vlan_vid | |
1172 | goto_table:3 | |
1173 | 3. arp,dl_vlan=200, priority 9131, cookie 0x5adc15c0 | |
1174 | goto_table:6 | |
1175 | 6. arp,arp_tpa=10.200.0.254, priority 9133, cookie 0x5adc15c0 | |
dcc3e70b | 1176 | CONTROLLER:128 |
98dc8dee BP |
1177 | |
1178 | Final flow: arp,in_port=4,dl_vlan=200,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_op=2,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01 | |
dcc3e70b | 1179 | Megaflow: recirc_id=0,eth,arp,in_port=4,vlan_tci=0x0000/0x1fff,dl_dst=0e:00:00:00:00:01,arp_tpa=10.200.0.254 |
d39ec23d | 1180 | Datapath actions: push_vlan(vid=200,pcp=0),userspace(pid=0,controller(reason=1,flags=0,recirc_id=7,rule_cookie=0x5adc15c0,controller_id=0,max_len=128)) |
98dc8dee BP |
1181 | |
1182 | It shows up in ``inst/faucet.log``:: | |
1183 | ||
dcc3e70b BC |
1184 | Jan 06 03:20:11 faucet.valve INFO DPID 1 (0x1) Adding new route 10.200.0.1/32 via 10.200.0.1 (00:10:20:30:40:50) on VLAN 200 |
1185 | Jan 06 03:20:11 faucet.valve INFO DPID 1 (0x1) ARP response 10.200.0.1 (00:10:20:30:40:50) on VLAN 200 | |
1186 | Jan 06 03:20:11 faucet.valve INFO DPID 1 (0x1) L2 learned 00:10:20:30:40:50 (L2 type 0x0806, L3 src 10.200.0.1) on Port 4 on VLAN 200 (1 hosts total) | |
98dc8dee BP |
1187 | |
1188 | and in the OVS flow tables:: | |
1189 | ||
1190 | $ diff-flows flows2 br0 | |
dcc3e70b | 1191 | +table=3 priority=9098,in_port=4,dl_vlan=200,dl_src=00:10:20:30:40:50 hard_timeout=3601 actions=goto_table:7 |
98dc8dee BP |
1192 | ... |
1193 | +table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.200.0.1 actions=set_field:4296->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:10:20:30:40:50->eth_dst,dec_ttl,goto_table:7 | |
1194 | +table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.200.0.1 actions=set_field:4296->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:10:20:30:40:50->eth_dst,dec_ttl,goto_table:7 | |
1195 | ... | |
1196 | +table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.200.0.0/24 actions=goto_table:6 | |
dcc3e70b | 1197 | +table=7 priority=9099,dl_vlan=200,dl_dst=00:10:20:30:40:50 idle_timeout=3601 actions=pop_vlan,output:4 |
98dc8dee BP |
1198 | |
1199 | Step 6: IP Packet Delivery | |
1200 | ++++++++++++++++++++++++++ | |
1201 | ||
1202 | Now both the host and the router have everything they need to deliver | |
1203 | the packet. There are two ways it might happen. If Faucet's router | |
1204 | is smart enough to buffer the packet that trigger ARP resolution, then | |
1205 | it might have delivered it already. If so, then it should show up in | |
1206 | ``p4.pcap``. Let's take a look:: | |
1207 | ||
dcc3e70b | 1208 | $ /usr/sbin/tcpdump -evvvr sandbox/p4.pcap ip |
98dc8dee BP |
1209 | reading from file sandbox/p4.pcap, link-type EN10MB (Ethernet) |
1210 | ||
1211 | Nope. That leaves the other possibility, which is that Faucet waits | |
1212 | for the original sending host to re-send the packet. We can do that | |
1213 | by re-running the trace:: | |
1214 | ||
1215 | $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,udp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64 -generate | |
1216 | Flow: udp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0 | |
1217 | ||
1218 | bridge("br0") | |
1219 | ------------- | |
1220 | 0. in_port=1, priority 9099, cookie 0x5adc15c0 | |
1221 | goto_table:1 | |
1222 | 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
1223 | push_vlan:0x8100 | |
1224 | set_field:4196->vlan_vid | |
1225 | goto_table:3 | |
1226 | 3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0 | |
1227 | goto_table:4 | |
1228 | 4. ip,dl_vlan=100,nw_dst=10.200.0.1, priority 9131, cookie 0x5adc15c0 | |
1229 | set_field:4296->vlan_vid | |
1230 | set_field:0e:00:00:00:00:01->eth_src | |
1231 | set_field:00:10:20:30:40:50->eth_dst | |
1232 | dec_ttl | |
1233 | goto_table:7 | |
1234 | 7. dl_vlan=200,dl_dst=00:10:20:30:40:50, priority 9099, cookie 0x5adc15c0 | |
1235 | pop_vlan | |
1236 | output:4 | |
1237 | ||
1238 | Final flow: udp,in_port=1,vlan_tci=0x0000,dl_src=0e:00:00:00:00:01,dl_dst=00:10:20:30:40:50,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=0,tp_dst=0 | |
1239 | Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_ttl=64,nw_frag=no | |
1240 | Datapath actions: set(eth(src=0e:00:00:00:00:01,dst=00:10:20:30:40:50)),set(ipv4(dst=10.200.0.1,ttl=63)),4 | |
1241 | ||
1242 | Finally, we have working IP packet forwarding! | |
1243 | ||
1244 | Performance | |
1245 | ~~~~~~~~~~~ | |
1246 | ||
1247 | Take another look at the megaflow line above:: | |
1248 | ||
1249 | Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_ttl=64,nw_frag=no | |
1250 | ||
1251 | This means that (almost) any packet between these Ethernet source and | |
1252 | destination hosts, destined to the given IP host, will be handled by | |
1253 | this single megaflow cache entry. So regardless of the number of UDP | |
1254 | packets or TCP connections that these hosts exchange, Open vSwitch | |
1255 | packet processing won't need to fall back to the slow path. It is | |
1256 | quite efficient. | |
1257 | ||
1258 | .. note:: | |
1259 | ||
1260 | The exceptions are packets with a TTL other than 64, and fragmented | |
1261 | packets. Most hosts use a constant TTL for outgoing packets, and | |
1262 | fragments are rare. If either of those did change, then that would | |
1263 | simply result in a new megaflow cache entry. | |
1264 | ||
1265 | The datapath actions might also be worth a look:: | |
1266 | ||
1267 | Datapath actions: set(eth(src=0e:00:00:00:00:01,dst=00:10:20:30:40:50)),set(ipv4(dst=10.200.0.1,ttl=63)),4 | |
1268 | ||
1269 | This just means that, to process these packets, the datapath changes | |
1270 | the Ethernet source and destination addresses and the IP TTL, and then | |
1271 | transmits the packet to port ``p4`` (also numbered 4). Notice in | |
1272 | particular that, despite the OpenFlow actions that pushed, modified, | |
1273 | and popped back off a VLAN, there is nothing in the datapath actions | |
1274 | about VLANs. This is because the OVS flow translation code "optimizes | |
1275 | out" redundant or unneeded actions, which saves time when the cache | |
1276 | entry is executed later. | |
1277 | ||
1278 | .. note:: | |
1279 | ||
1280 | It's not clear why the actions also re-set the IP destination | |
1281 | address to its original value. Perhaps this is a minor performance | |
1282 | bug. | |
1283 | ||
1284 | ACLs | |
1285 | ---- | |
1286 | ||
1287 | Let's try out some ACLs, since they do a good job illustrating some of | |
1288 | the ways that OVS tries to optimize megaflows. Update | |
1289 | ``inst/faucet.yaml`` to the following:: | |
1290 | ||
1291 | dps: | |
1292 | switch-1: | |
5a0e4aec BP |
1293 | dp_id: 0x1 |
1294 | timeout: 3600 | |
1295 | arp_neighbor_timeout: 3600 | |
1296 | interfaces: | |
1297 | 1: | |
1298 | native_vlan: 100 | |
1299 | acl_in: 1 | |
1300 | 2: | |
1301 | native_vlan: 100 | |
1302 | 3: | |
1303 | native_vlan: 100 | |
1304 | 4: | |
1305 | native_vlan: 200 | |
1306 | 5: | |
1307 | native_vlan: 200 | |
98dc8dee BP |
1308 | vlans: |
1309 | 100: | |
5a0e4aec | 1310 | faucet_vips: ["10.100.0.254/24"] |
98dc8dee | 1311 | 200: |
5a0e4aec | 1312 | faucet_vips: ["10.200.0.254/24"] |
98dc8dee BP |
1313 | routers: |
1314 | router-1: | |
5a0e4aec | 1315 | vlans: [100, 200] |
98dc8dee BP |
1316 | acls: |
1317 | 1: | |
5a0e4aec BP |
1318 | - rule: |
1319 | dl_type: 0x800 | |
1320 | nw_proto: 6 | |
1321 | tcp_dst: 8080 | |
1322 | actions: | |
1323 | allow: 0 | |
1324 | - rule: | |
1325 | actions: | |
1326 | allow: 1 | |
98dc8dee BP |
1327 | |
1328 | Then restart Faucet:: | |
1329 | ||
1330 | $ docker restart faucet | |
1331 | ||
1332 | On port 1, this new configuration blocks all traffic to TCP port 8080 | |
1333 | and allows all other traffic. The resulting change in the flow table | |
1334 | shows this clearly too:: | |
1335 | ||
1336 | $ diff-flows flows2 br0 | |
1337 | -priority=9099,in_port=1 actions=goto_table:1 | |
1338 | +priority=9098,in_port=1 actions=goto_table:1 | |
1339 | +priority=9099,tcp,in_port=1,tp_dst=8080 actions=drop | |
1340 | ||
1341 | The most interesting question here is performance. If you recall the | |
1342 | earlier discussion, when a packet through the flow table encounters a | |
1343 | match on a given field, the resulting megaflow has to match on that | |
1344 | field, even if the flow didn't actually match. This is expensive. | |
1345 | ||
1346 | In particular, here you can see that any TCP packet is going to | |
1347 | encounter the ACL flow, even if it is directed to a port other than | |
1348 | 8080. If that means that every megaflow for a TCP packet is going to | |
1349 | have to match on the TCP destination, that's going to be bad for | |
1350 | caching performance because there will be a need for a separate | |
1351 | megaflow for every TCP destination port that actually appears in | |
1352 | traffic, which means a lot more megaflows than otherwise. (Really, in | |
1353 | practice, if such a simple ACL blew up performance, OVS wouldn't be a | |
1354 | very good switch!) | |
1355 | ||
1356 | Let's see what happens, by sending a packet to port 80 (instead of | |
1357 | 8080):: | |
1358 | ||
1359 | $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,tcp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64,tp_dst=80 -generate | |
dcc3e70b | 1360 | Flow: tcp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 |
98dc8dee BP |
1361 | |
1362 | bridge("br0") | |
1363 | ------------- | |
1364 | 0. in_port=1, priority 9098, cookie 0x5adc15c0 | |
1365 | goto_table:1 | |
1366 | 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 | |
1367 | push_vlan:0x8100 | |
1368 | set_field:4196->vlan_vid | |
1369 | goto_table:3 | |
1370 | 3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0 | |
1371 | goto_table:4 | |
1372 | 4. ip,dl_vlan=100,nw_dst=10.200.0.0/24, priority 9123, cookie 0x5adc15c0 | |
1373 | goto_table:6 | |
dcc3e70b BC |
1374 | 6. ip, priority 9130, cookie 0x5adc15c0 |
1375 | CONTROLLER:128 | |
98dc8dee BP |
1376 | |
1377 | Final flow: tcp,in_port=1,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 | |
1378 | Megaflow: recirc_id=0,eth,tcp,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_frag=no,tp_dst=0x0/0xf000 | |
1379 | Datapath actions: push_vlan(vid=100,pcp=0) | |
1380 | ||
1381 | Take a look at the Megaflow line and in particular the match on | |
1382 | ``tp_dst``, which says ``tp_dst=0x0/0xf000``. What this means is that | |
1383 | the megaflow matches on only the top 4 bits of the TCP destination | |
1384 | port. That works because:: | |
1385 | ||
35b2520a | 1386 | 80 (base 10) == 0000,0000,0101,0000 (base 2) |
1387 | 8080 (base 10) == 0001,1111,1001,0000 (base 2) | |
98dc8dee BP |
1388 | |
1389 | and so by matching on only the top 4 bits, rather than all 16, the OVS | |
1390 | fast path can distinguish port 80 from port 8080. This allows this | |
1391 | megaflow to match one-sixteenth of the TCP destination port address | |
1392 | space, rather than just 1/65536th of it. | |
1393 | ||
1394 | .. note:: | |
1395 | ||
1396 | The algorithm OVS uses for this purpose isn't perfect. In this | |
1397 | case, a single-bit match would work (e.g. tp_dst=0x0/0x1000), and | |
1398 | would be superior since it would only match half the port address | |
1399 | space instead of one-sixteenth. | |
1400 | ||
1401 | For details of this algorithm, please refer to ``lib/classifier.c`` in | |
1402 | the Open vSwitch source tree, or our 2015 NSDI paper "The Design and | |
1403 | Implementation of Open vSwitch". | |
1404 | ||
1405 | Finishing Up | |
1406 | ------------ | |
1407 | ||
1408 | When you're done, you probably want to exit the sandbox session, with | |
1409 | Control+D or ``exit``, and stop the Faucet controller with ``docker | |
1410 | stop faucet; docker rm faucet``. | |
1411 | ||
1412 | Further Directions | |
1413 | ------------------ | |
1414 | ||
1415 | We've looked a fair bit at how Faucet interacts with Open vSwitch. If | |
1416 | you still have some interest, you might want to explore some of these | |
1417 | directions: | |
1418 | ||
1419 | * Adding more than one switch. Faucet can control multiple switches | |
1420 | but we've only been simulating one of them. It's easy enough to | |
1421 | make a single OVS instance act as multiple switches (just | |
1422 | ``ovs-vsctl add-br`` another bridge), or you could use genuinely | |
1423 | separate OVS instances. | |
1424 | ||
1425 | * Additional features. Faucet has more features than we've | |
1426 | demonstrated, such as IPv6 routing and port mirroring. These should | |
1427 | also interact gracefully with Open vSwitch. | |
1428 | ||
1429 | * Real performance testing. We've looked at how flows and traces | |
1430 | **should** demonstrate good performance, but of course there's no | |
1431 | proof until it actually works in practice. We've also only tested | |
1432 | with trivial configurations. Open vSwitch can scale to millions of | |
1433 | OpenFlow flows, but the scaling in practice depends on the | |
1434 | particular flow tables and traffic patterns, so it's valuable to | |
1435 | test with large configurations, either in the way we've done it or | |
1436 | with real traffic. |