]>
Commit | Line | Data |
---|---|---|
d31f1109 JP |
1 | Design Decisions In Open vSwitch |
2 | ================================ | |
3 | ||
4 | This document describes design decisions that went into implementing | |
5 | Open vSwitch. While we believe these to be reasonable decisions, it is | |
6 | impossible to predict how Open vSwitch will be used in all environments. | |
7 | Understanding assumptions made by Open vSwitch is critical to a | |
8 | successful deployment. The end of this document contains contact | |
9 | information that can be used to let us know how we can make Open vSwitch | |
10 | more generally useful. | |
11 | ||
80d5aefd BP |
12 | Asynchronous Messages |
13 | ===================== | |
14 | ||
15 | Over time, Open vSwitch has added many knobs that control whether a | |
16 | given controller receives OpenFlow asynchronous messages. This | |
17 | section describes how all of these features interact. | |
18 | ||
19 | First, a service controller never receives any asynchronous messages | |
20 | unless it explicitly configures a miss_send_len greater than zero with | |
21 | an OFPT_SET_CONFIG message. | |
22 | ||
23 | Second, OFPT_FLOW_REMOVED and NXT_FLOW_REMOVED messages are generated | |
24 | only if the flow that was removed had the OFPFF_SEND_FLOW_REM flag | |
25 | set. | |
26 | ||
a7349929 BP |
27 | Third, OFPT_PACKET_IN and NXT_PACKET_IN messages are sent only to |
28 | OpenFlow controller connections that have the correct connection ID | |
29 | (see "struct nx_controller_id" and "struct nx_action_controller"): | |
30 | ||
31 | - For packet-in messages generated by a NXAST_CONTROLLER action, | |
32 | the controller ID specified in the action. | |
33 | ||
34 | - For other packet-in messages, controller ID zero. (This is the | |
35 | default ID when an OpenFlow controller does not configure one.) | |
36 | ||
80d5aefd BP |
37 | Finally, Open vSwitch consults a per-connection table indexed by the |
38 | message type, reason code, and current role. The following table | |
39 | shows how this table is initialized by default when an OpenFlow | |
40 | connection is made. An entry labeled "yes" means that the message is | |
41 | sent, an entry labeled "---" means that the message is suppressed. | |
42 | ||
43 | master/ | |
44 | message and reason code other slave | |
45 | ---------------------------------------- ------- ----- | |
46 | OFPT_PACKET_IN / NXT_PACKET_IN | |
47 | OFPR_NO_MATCH yes --- | |
48 | OFPR_ACTION yes --- | |
49 | OFPR_INVALID_TTL --- --- | |
50 | ||
51 | OFPT_FLOW_REMOVED / NXT_FLOW_REMOVED | |
52 | OFPRR_IDLE_TIMEOUT yes --- | |
53 | OFPRR_HARD_TIMEOUT yes --- | |
54 | OFPRR_DELETE yes --- | |
55 | ||
56 | OFPT_PORT_STATUS | |
57 | OFPPR_ADD yes yes | |
58 | OFPPR_DELETE yes yes | |
59 | OFPPR_MODIFY yes yes | |
60 | ||
61 | The NXT_SET_ASYNC_CONFIG message directly sets all of the values in | |
62 | this table for the current connection. The | |
63 | OFPC_INVALID_TTL_TO_CONTROLLER bit in the OFPT_SET_CONFIG message | |
64 | controls the setting for OFPR_INVALID_TTL for the "master" role. | |
65 | ||
66 | ||
67 | OFPAT_ENQUEUE | |
68 | ============= | |
82172632 EJ |
69 | |
70 | The OpenFlow 1.0 specification requires the output port of the OFPAT_ENQUEUE | |
71 | action to "refer to a valid physical port (i.e. < OFPP_MAX) or OFPP_IN_PORT". | |
72 | Although OFPP_LOCAL is not less than OFPP_MAX, it is an 'internal' port which | |
73 | can have QoS applied to it in Linux. Since we allow the OFPAT_ENQUEUE to apply | |
74 | to 'internal' ports whose port numbers are less than OFPP_MAX, we interpret | |
75 | OFPP_LOCAL as a physical port and support OFPAT_ENQUEUE on it as well. | |
76 | ||
d31f1109 | 77 | |
12442ec5 BP |
78 | OFPT_FLOW_MOD |
79 | ============= | |
80 | ||
81 | The OpenFlow 1.0 specification for the behavior of OFPT_FLOW_MOD is | |
82 | confusing. The following table summarizes the Open vSwitch | |
83 | implementation of its behavior in the following categories: | |
84 | ||
85 | - "match on priority": Whether the flow_mod acts only on flows | |
86 | whose priority matches that included in the flow_mod message. | |
87 | ||
88 | - "match on out_port": Whether the flow_mod acts only on flows | |
89 | that output to the out_port included in the flow_mod message (if | |
90 | out_port is not OFPP_NONE). | |
91 | ||
92 | - "updates flow_cookie": Whether the flow_mod changes the | |
93 | flow_cookie of the flow or flows that it matches to the | |
94 | flow_cookie included in the flow_mod message. | |
95 | ||
96 | - "updates OFPFF_ flags": Whether the flow_mod changes the | |
97 | OFPFF_SEND_FLOW_REM flag of the flow or flows that it matches to | |
98 | the setting included in the flags of the flow_mod message. | |
99 | ||
100 | - "honors OFPFF_CHECK_OVERLAP": Whether the OFPFF_CHECK_OVERLAP | |
101 | flag in the flow_mod is significant. | |
102 | ||
103 | - "updates idle_timeout" and "updates hard_timeout": Whether the | |
104 | idle_timeout and hard_timeout in the flow_mod, respectively, | |
105 | have an effect on the flow or flows matched by the flow_mod. | |
106 | ||
107 | - "updates idle timer": Whether the flow_mod resets the per-flow | |
108 | timer that measures how long a flow has been idle. | |
109 | ||
110 | - "updates hard timer": Whether the flow_mod resets the per-flow | |
111 | timer that measures how long it has been since a flow was | |
112 | modified. | |
113 | ||
114 | - "zeros counters": Whether the flow_mod resets per-flow packet | |
115 | and byte counters to zero. | |
116 | ||
117 | - "sends flow_removed message": Whether the flow_mod generates a | |
118 | flow_removed message for the flow or flows that it affects. | |
119 | ||
120 | An entry labeled "yes" means that the flow mod type does have the | |
121 | indicated behavior, "---" means that it does not, an empty cell means | |
122 | that the property is not applicable, and other values are explained | |
123 | below the table. | |
124 | ||
125 | MODIFY DELETE | |
126 | ADD MODIFY STRICT DELETE STRICT | |
127 | === ====== ====== ====== ====== | |
906087ee BP |
128 | match on priority --- --- yes --- yes |
129 | match on out_port --- --- --- yes yes | |
12442ec5 BP |
130 | updates flow_cookie yes yes yes |
131 | updates OFPFF_SEND_FLOW_REM yes + + | |
132 | honors OFPFF_CHECK_OVERLAP yes + + | |
133 | updates idle_timeout yes + + | |
134 | updates hard_timeout yes + + | |
135 | resets idle timer yes + + | |
136 | resets hard timer yes yes yes | |
137 | zeros counters yes + + | |
138 | sends flow_removed message --- --- --- % % | |
139 | ||
140 | (+) "modify" and "modify-strict" only take these actions when they | |
141 | create a new flow, not when they update an existing flow. | |
142 | ||
143 | (%) "delete" and "delete_strict" generates a flow_removed message if | |
144 | the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. | |
145 | (Each controller can separately control whether it wants to | |
146 | receive the generated messages.) | |
147 | ||
148 | ||
f66b87de BP |
149 | Flow Cookies |
150 | ============ | |
151 | ||
152 | OpenFlow 1.0 and later versions have the concept of a "flow cookie", | |
153 | which is a 64-bit integer value attached to each flow. The treatment | |
154 | of the flow cookie has varied greatly across OpenFlow versions, | |
155 | however. | |
156 | ||
157 | In OpenFlow 1.0: | |
158 | ||
159 | - OFPFC_ADD set the cookie in the flow that it added. | |
160 | ||
161 | - OFPFC_MODIFY and OFPFC_MODIFY_STRICT updated the cookie for | |
162 | the flow or flows that it modified. | |
163 | ||
164 | - OFPST_FLOW messages included the flow cookie. | |
165 | ||
166 | - OFPT_FLOW_REMOVED messages reported the cookie of the flow | |
167 | that was removed. | |
168 | ||
169 | OpenFlow 1.1 made the following changes: | |
170 | ||
171 | - Flow mod operations OFPFC_MODIFY, OFPFC_MODIFY_STRICT, | |
172 | OFPFC_DELETE, and OFPFC_DELETE_STRICT, plus flow stats | |
173 | requests and aggregate stats requests, gained the ability to | |
174 | match on flow cookies with an arbitrary mask. | |
175 | ||
176 | - OFPFC_MODIFY and OFPFC_MODIFY_STRICT were changed to add a | |
177 | new flow, in the case of no match, only if the flow table | |
178 | modification operation did not match on the cookie field. | |
179 | (In OpenFlow 1.0, modify operations always added a new flow | |
180 | when there was no match.) | |
181 | ||
182 | - OFPFC_MODIFY and OFPFC_MODIFY_STRICT no longer updated flow | |
183 | cookies. | |
184 | ||
185 | OpenFlow 1.2 made the following changes: | |
186 | ||
187 | - OFPC_MODIFY and OFPFC_MODIFY_STRICT were changed to never | |
188 | add a new flow, regardless of whether the flow cookie was | |
189 | used for matching. | |
190 | ||
191 | Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0 | |
192 | behavior with the following extensions: | |
193 | ||
194 | - An NXM extension field NXM_NX_COOKIE(_W) allows the NXM | |
195 | versions of OFPFC_MODIFY, OFPFC_MODIFY_STRICT, OFPFC_DELETE, | |
196 | and OFPFC_DELETE_STRICT flow_mods, plus flow stats requests | |
197 | and aggregate stats requests, to match on flow cookies with | |
198 | arbitrary masks. This is much like the equivalent OpenFlow | |
199 | 1.1 feature. | |
200 | ||
623e1caf JP |
201 | - Like OpenFlow 1.1, OFPC_MODIFY and OFPFC_MODIFY_STRICT add a |
202 | new flow if there is no match and the mask is zero (or not | |
203 | given). | |
204 | ||
205 | - The "cookie" field in OFPT_FLOW_MOD and NXT_FLOW_MOD messages | |
206 | is used as the cookie value for OFPFC_ADD commands, as | |
207 | described in OpenFlow 1.0. For OFPFC_MODIFY and | |
208 | OFPFC_MODIFY_STRICT commands, the "cookie" field is used as a | |
209 | new cookie for flows that match unless it is UINT64_MAX, in | |
210 | which case the flow's cookie is not updated. | |
f66b87de BP |
211 | |
212 | - NXT_PACKET_IN (the Nicira extended version of | |
213 | OFPT_PACKET_IN) reports the cookie of the rule that | |
214 | generated the packet, or all-1-bits if no rule generated the | |
215 | packet. (Older versions of OVS used all-0-bits instead of | |
216 | all-1-bits.) | |
217 | ||
623e1caf JP |
218 | The following table shows the handling of different protocols when |
219 | receiving OFPFC_MODIFY and OFPFC_MODIFY_STRICT messages. A mask of 0 | |
220 | indicates either an explicit mask of zero or an implicit one by not | |
221 | specifying the NXM_NX_COOKIE(_W) field. | |
222 | ||
223 | Match Update Add on miss Add on miss | |
224 | cookie cookie mask!=0 mask==0 | |
225 | ====== ====== =========== =========== | |
226 | OpenFlow 1.0 no yes <always add on miss> | |
227 | OpenFlow 1.1 yes no no yes | |
228 | OpenFlow 1.2 yes no no no | |
229 | NXM yes yes* no yes | |
230 | ||
231 | * Updates the flow's cookie unless the "cookie" field is UINT64_MAX. | |
232 | ||
f66b87de | 233 | |
66abb12b BP |
234 | Multiple Table Support |
235 | ====================== | |
236 | ||
237 | OpenFlow 1.0 has only rudimentary support for multiple flow tables. | |
238 | Notably, OpenFlow 1.0 does not allow the controller to specify the | |
239 | flow table to which a flow is to be added. Open vSwitch adds an | |
240 | extension for this purpose, which is enabled on a per-OpenFlow | |
241 | connection basis using the NXT_FLOW_MOD_TABLE_ID message. When the | |
242 | extension is enabled, the upper 8 bits of the 'command' member in an | |
243 | OFPT_FLOW_MOD or NXT_FLOW_MOD message designates the table to which a | |
244 | flow is to be added. | |
245 | ||
246 | The Open vSwitch software switch implementation offers 255 flow | |
247 | tables. On packet ingress, only the first flow table (table 0) is | |
248 | searched, and the contents of the remaining tables are not considered | |
249 | in any way. Tables other than table 0 only come into play when an | |
250 | NXAST_RESUBMIT_TABLE action specifies another table to search. | |
251 | ||
252 | Tables 128 and above are reserved for use by the switch itself. | |
253 | Controllers should use only tables 0 through 127. | |
254 | ||
255 | ||
d31f1109 JP |
256 | IPv6 |
257 | ==== | |
258 | ||
259 | Open vSwitch supports stateless handling of IPv6 packets. Flows can be | |
260 | written to support matching TCP, UDP, and ICMPv6 headers within an IPv6 | |
685a51a5 JP |
261 | packet. Deeper matching of some Neighbor Discovery messages is also |
262 | supported. | |
d31f1109 JP |
263 | |
264 | IPv6 was not designed to interact well with middle-boxes. This, | |
265 | combined with Open vSwitch's stateless nature, have affected the | |
266 | processing of IPv6 traffic, which is detailed below. | |
267 | ||
268 | Extension Headers | |
269 | ----------------- | |
270 | ||
271 | The base IPv6 header is incredibly simple with the intention of only | |
272 | containing information relevant for routing packets between two | |
273 | endpoints. IPv6 relies heavily on the use of extension headers to | |
274 | provide any other functionality. Unfortunately, the extension headers | |
275 | were designed in such a way that it is impossible to move to the next | |
276 | header (including the layer-4 payload) unless the current header is | |
277 | understood. | |
278 | ||
279 | Open vSwitch will process the following extension headers and continue | |
280 | to the next header: | |
281 | ||
282 | * Fragment (see the next section) | |
283 | * AH (Authentication Header) | |
284 | * Hop-by-Hop Options | |
285 | * Routing | |
286 | * Destination Options | |
287 | ||
288 | When a header is encountered that is not in that list, it is considered | |
289 | "terminal". A terminal header's IPv6 protocol value is stored in | |
290 | "nw_proto" for matching purposes. If a terminal header is TCP, UDP, or | |
291 | ICMPv6, the packet will be further processed in an attempt to extract | |
292 | layer-4 information. | |
293 | ||
294 | Fragments | |
295 | --------- | |
296 | ||
297 | IPv6 requires that every link in the internet have an MTU of 1280 octets | |
298 | or greater (RFC 2460). As such, a terminal header (as described above in | |
299 | "Extension Headers") in the first fragment should generally be | |
300 | reachable. In this case, the terminal header's IPv6 protocol type is | |
301 | stored in the "nw_proto" field for matching purposes. If a terminal | |
302 | header cannot be found in the first fragment (one with a fragment offset | |
303 | of zero), the "nw_proto" field is set to 0. Subsequent fragments (those | |
304 | with a non-zero fragment offset) have the "nw_proto" field set to the | |
305 | IPv6 protocol type for fragments (44). | |
306 | ||
307 | Jumbograms | |
308 | ---------- | |
309 | ||
310 | An IPv6 jumbogram (RFC 2675) is a packet containing a payload longer | |
311 | than 65,535 octets. A jumbogram is only relevant in subnets with a link | |
312 | MTU greater than 65,575 octets, and are not required to be supported on | |
313 | nodes that do not connect to link with such large MTUs. Currently, Open | |
314 | vSwitch doesn't process jumbograms. | |
315 | ||
316 | ||
946350dc BP |
317 | In-Band Control |
318 | =============== | |
319 | ||
56e9c3b9 BP |
320 | Motivation |
321 | ---------- | |
322 | ||
323 | An OpenFlow switch must establish and maintain a TCP network | |
324 | connection to its controller. There are two basic ways to categorize | |
325 | the network that this connection traverses: either it is completely | |
326 | separate from the one that the switch is otherwise controlling, or its | |
327 | path may overlap the network that the switch controls. We call the | |
328 | former case "out-of-band control", the latter case "in-band control". | |
329 | ||
330 | Out-of-band control has the following benefits: | |
331 | ||
332 | - Simplicity: Out-of-band control slightly simplifies the switch | |
333 | implementation. | |
334 | ||
335 | - Reliability: Excessive switch traffic volume cannot interfere | |
336 | with control traffic. | |
337 | ||
338 | - Integrity: Machines not on the control network cannot | |
339 | impersonate a switch or a controller. | |
340 | ||
341 | - Confidentiality: Machines not on the control network cannot | |
342 | snoop on control traffic. | |
343 | ||
344 | In-band control, on the other hand, has the following advantages: | |
345 | ||
346 | - No dedicated port: There is no need to dedicate a physical | |
347 | switch port to control, which is important on switches that have | |
348 | few ports (e.g. wireless routers, low-end embedded platforms). | |
349 | ||
350 | - No dedicated network: There is no need to build and maintain a | |
351 | separate control network. This is important in many | |
352 | environments because it reduces proliferation of switches and | |
353 | wiring. | |
354 | ||
355 | Open vSwitch supports both out-of-band and in-band control. This | |
356 | section describes the principles behind in-band control. See the | |
357 | description of the Controller table in ovs-vswitchd.conf.db(5) to | |
358 | configure OVS for in-band control. | |
359 | ||
360 | Principles | |
361 | ---------- | |
362 | ||
363 | The fundamental principle of in-band control is that an OpenFlow | |
364 | switch must recognize and switch control traffic without involving the | |
365 | OpenFlow controller. All the details of implementing in-band control | |
366 | are special cases of this principle. | |
367 | ||
368 | The rationale for this principle is simple. If the switch does not | |
369 | handle in-band control traffic itself, then it will be caught in a | |
370 | contradiction: it must contact the controller, but it cannot, because | |
371 | only the controller can set up the flows that are needed to contact | |
372 | the controller. | |
373 | ||
374 | The following points describe important special cases of this | |
375 | principle. | |
376 | ||
377 | - In-band control must be implemented regardless of whether the | |
378 | switch is connected. | |
379 | ||
380 | It is tempting to implement the in-band control rules only when | |
381 | the switch is not connected to the controller, using the | |
382 | reasoning that the controller should have complete control once | |
383 | it has established a connection with the switch. | |
384 | ||
385 | This does not work in practice. Consider the case where the | |
386 | switch is connected to the controller. Occasionally it can | |
387 | happen that the controller forgets or otherwise needs to obtain | |
388 | the MAC address of the switch. To do so, the controller sends a | |
389 | broadcast ARP request. A switch that implements the in-band | |
390 | control rules only when it is disconnected will then send an | |
391 | OFPT_PACKET_IN message up to the controller. The controller will | |
392 | be unable to respond, because it does not know the MAC address of | |
393 | the switch. This is a deadlock situation that can only be | |
394 | resolved by the switch noticing that its connection to the | |
395 | controller has hung and reconnecting. | |
396 | ||
397 | - In-band control must override flows set up by the controller. | |
398 | ||
399 | It is reasonable to assume that flows set up by the OpenFlow | |
400 | controller should take precedence over in-band control, on the | |
401 | basis that the controller should be in charge of the switch. | |
402 | ||
403 | Again, this does not work in practice. Reasonable controller | |
404 | implementations may set up a "last resort" fallback rule that | |
405 | wildcards every field and, e.g., sends it up to the controller or | |
406 | discards it. If a controller does that, then it will isolate | |
407 | itself from the switch. | |
408 | ||
409 | - The switch must recognize all control traffic. | |
410 | ||
411 | The fundamental principle of in-band control states, in part, | |
412 | that a switch must recognize control traffic without involving | |
413 | the OpenFlow controller. More specifically, the switch must | |
414 | recognize *all* control traffic. "False negatives", that is, | |
415 | packets that constitute control traffic but that the switch does | |
416 | not recognize as control traffic, lead to control traffic storms. | |
417 | ||
418 | Consider an OpenFlow switch that only recognizes control packets | |
419 | sent to or from that switch. Now suppose that two switches of | |
420 | this type, named A and B, are connected to ports on an Ethernet | |
421 | hub (not a switch) and that an OpenFlow controller is connected | |
422 | to a third hub port. In this setup, control traffic sent by | |
423 | switch A will be seen by switch B, which will send it to the | |
424 | controller as part of an OFPT_PACKET_IN message. Switch A will | |
425 | then see the OFPT_PACKET_IN message's packet, re-encapsulate it | |
426 | in another OFPT_PACKET_IN, and send it to the controller. Switch | |
427 | B will then see that OFPT_PACKET_IN, and so on in an infinite | |
428 | loop. | |
429 | ||
430 | Incidentally, the consequences of "false positives", where | |
431 | packets that are not control traffic are nevertheless recognized | |
432 | as control traffic, are much less severe. The controller will | |
433 | not be able to control their behavior, but the network will | |
434 | remain in working order. False positives do constitute a | |
435 | security problem. | |
436 | ||
437 | - The switch should use echo-requests to detect disconnection. | |
438 | ||
439 | TCP will notice that a connection has hung, but this can take a | |
440 | considerable amount of time. For example, with default settings | |
441 | the Linux kernel TCP implementation will retransmit for between | |
442 | 13 and 30 minutes, depending on the connection's retransmission | |
443 | timeout, according to kernel documentation. This is far too long | |
444 | for a switch to be disconnected, so an OpenFlow switch should | |
445 | implement its own connection timeout. OpenFlow OFPT_ECHO_REQUEST | |
446 | messages are the best way to do this, since they test the | |
447 | OpenFlow connection itself. | |
448 | ||
449 | Implementation | |
450 | -------------- | |
451 | ||
452 | This section describes how Open vSwitch implements in-band control. | |
453 | Correctly implementing in-band control has proven difficult due to its | |
454 | many subtleties, and has thus gone through many iterations. Please | |
455 | read through and understand the reasoning behind the chosen rules | |
456 | before making modifications. | |
457 | ||
458 | Open vSwitch implements in-band control as "hidden" flows, that is, | |
459 | flows that are not visible through OpenFlow, and at a higher priority | |
460 | than wildcarded flows can be set up through OpenFlow. This is done so | |
461 | that the OpenFlow controller cannot interfere with them and possibly | |
462 | break connectivity with its switches. It is possible to see all | |
463 | flows, including in-band ones, with the ovs-appctl "bridge/dump-flows" | |
464 | command. | |
946350dc BP |
465 | |
466 | The Open vSwitch implementation of in-band control can hide traffic to | |
467 | arbitrary "remotes", where each remote is one TCP port on one IP address. | |
468 | Currently the remotes are automatically configured as the in-band OpenFlow | |
469 | controllers plus the OVSDB managers, if any. (The latter is a requirement | |
470 | because OVSDB managers are responsible for configuring OpenFlow controllers, | |
471 | so if the manager cannot be reached then OpenFlow cannot be reconfigured.) | |
472 | ||
473 | The following rules (with the OFPP_NORMAL action) are set up on any bridge | |
474 | that has any remotes: | |
475 | ||
476 | (a) DHCP requests sent from the local port. | |
477 | (b) ARP replies to the local port's MAC address. | |
478 | (c) ARP requests from the local port's MAC address. | |
479 | ||
480 | In-band also sets up the following rules for each unique next-hop MAC | |
481 | address for the remotes' IPs (the "next hop" is either the remote | |
482 | itself, if it is on a local subnet, or the gateway to reach the remote): | |
483 | ||
484 | (d) ARP replies to the next hop's MAC address. | |
485 | (e) ARP requests from the next hop's MAC address. | |
486 | ||
487 | In-band also sets up the following rules for each unique remote IP address: | |
488 | ||
489 | (f) ARP replies containing the remote's IP address as a target. | |
490 | (g) ARP requests containing the remote's IP address as a source. | |
491 | ||
492 | In-band also sets up the following rules for each unique remote (IP,port) | |
493 | pair: | |
494 | ||
495 | (h) TCP traffic to the remote's IP and port. | |
496 | (i) TCP traffic from the remote's IP and port. | |
497 | ||
498 | The goal of these rules is to be as narrow as possible to allow a | |
499 | switch to join a network and be able to communicate with the | |
500 | remotes. As mentioned earlier, these rules have higher priority | |
501 | than the controller's rules, so if they are too broad, they may | |
502 | prevent the controller from implementing its policy. As such, | |
503 | in-band actively monitors some aspects of flow and packet processing | |
504 | so that the rules can be made more precise. | |
505 | ||
506 | In-band control monitors attempts to add flows into the datapath that | |
507 | could interfere with its duties. The datapath only allows exact | |
508 | match entries, so in-band control is able to be very precise about | |
509 | the flows it prevents. Flows that miss in the datapath are sent to | |
510 | userspace to be processed, so preventing these flows from being | |
511 | cached in the "fast path" does not affect correctness. The only type | |
512 | of flow that is currently prevented is one that would prevent DHCP | |
513 | replies from being seen by the local port. For example, a rule that | |
514 | forwarded all DHCP traffic to the controller would not be allowed, | |
515 | but one that forwarded to all ports (including the local port) would. | |
516 | ||
517 | As mentioned earlier, packets that miss in the datapath are sent to | |
518 | the userspace for processing. The userspace has its own flow table, | |
519 | the "classifier", so in-band checks whether any special processing | |
520 | is needed before the classifier is consulted. If a packet is a DHCP | |
521 | response to a request from the local port, the packet is forwarded to | |
522 | the local port, regardless of the flow table. Note that this requires | |
523 | L7 processing of DHCP replies to determine whether the 'chaddr' field | |
524 | matches the MAC address of the local port. | |
525 | ||
526 | It is interesting to note that for an L3-based in-band control | |
527 | mechanism, the majority of rules are devoted to ARP traffic. At first | |
528 | glance, some of these rules appear redundant. However, each serves an | |
529 | important role. First, in order to determine the MAC address of the | |
530 | remote side (controller or gateway) for other ARP rules, we must allow | |
531 | ARP traffic for our local port with rules (b) and (c). If we are | |
532 | between a switch and its connection to the remote, we have to | |
533 | allow the other switch's ARP traffic to through. This is done with | |
534 | rules (d) and (e), since we do not know the addresses of the other | |
535 | switches a priori, but do know the remote's or gateway's. Finally, | |
536 | if the remote is running in a local guest VM that is not reached | |
537 | through the local port, the switch that is connected to the VM must | |
538 | allow ARP traffic based on the remote's IP address, since it will | |
539 | not know the MAC address of the local port that is sending the traffic | |
540 | or the MAC address of the remote in the guest VM. | |
541 | ||
542 | With a few notable exceptions below, in-band should work in most | |
543 | network setups. The following are considered "supported' in the | |
544 | current implementation: | |
545 | ||
546 | - Locally Connected. The switch and remote are on the same | |
547 | subnet. This uses rules (a), (b), (c), (h), and (i). | |
548 | ||
549 | - Reached through Gateway. The switch and remote are on | |
550 | different subnets and must go through a gateway. This uses | |
551 | rules (a), (b), (c), (h), and (i). | |
552 | ||
553 | - Between Switch and Remote. This switch is between another | |
554 | switch and the remote, and we want to allow the other | |
555 | switch's traffic through. This uses rules (d), (e), (h), and | |
556 | (i). It uses (b) and (c) indirectly in order to know the MAC | |
557 | address for rules (d) and (e). Note that DHCP for the other | |
558 | switch will not work unless an OpenFlow controller explicitly lets this | |
559 | switch pass the traffic. | |
560 | ||
561 | - Between Switch and Gateway. This switch is between another | |
562 | switch and the gateway, and we want to allow the other switch's | |
563 | traffic through. This uses the same rules and logic as the | |
564 | "Between Switch and Remote" configuration described earlier. | |
565 | ||
566 | - Remote on Local VM. The remote is a guest VM on the | |
567 | system running in-band control. This uses rules (a), (b), (c), | |
568 | (h), and (i). | |
569 | ||
570 | - Remote on Local VM with Different Networks. The remote | |
571 | is a guest VM on the system running in-band control, but the | |
572 | local port is not used to connect to the remote. For | |
573 | example, an IP address is configured on eth0 of the switch. The | |
574 | remote's VM is connected through eth1 of the switch, but an | |
575 | IP address has not been configured for that port on the switch. | |
576 | As such, the switch will use eth0 to connect to the remote, | |
577 | and eth1's rules about the local port will not work. In the | |
578 | example, the switch attached to eth0 would use rules (a), (b), | |
579 | (c), (h), and (i) on eth0. The switch attached to eth1 would use | |
580 | rules (f), (g), (h), and (i). | |
581 | ||
582 | The following are explicitly *not* supported by in-band control: | |
583 | ||
584 | - Specify Remote by Name. Currently, the remote must be | |
585 | identified by IP address. A naive approach would be to permit | |
586 | all DNS traffic. Unfortunately, this would prevent the | |
587 | controller from defining any policy over DNS. Since switches | |
588 | that are located behind us need to connect to the remote, | |
589 | in-band cannot simply add a rule that allows DNS traffic from | |
590 | the local port. The "correct" way to support this is to parse | |
591 | DNS requests to allow all traffic related to a request for the | |
592 | remote's name through. Due to the potential security | |
593 | problems and amount of processing, we decided to hold off for | |
594 | the time-being. | |
595 | ||
596 | - Differing Remotes for Switches. All switches must know | |
597 | the L3 addresses for all the remotes that other switches | |
598 | may use, since rules need to be set up to allow traffic related | |
599 | to those remotes through. See rules (f), (g), (h), and (i). | |
600 | ||
601 | - Differing Routes for Switches. In order for the switch to | |
602 | allow other switches to connect to a remote through a | |
603 | gateway, it allows the gateway's traffic through with rules (d) | |
604 | and (e). If the routes to the remote differ for the two | |
605 | switches, we will not know the MAC address of the alternate | |
606 | gateway. | |
607 | ||
608 | ||
d31f1109 JP |
609 | Suggestions |
610 | =========== | |
611 | ||
612 | Suggestions to improve Open vSwitch are welcome at discuss@openvswitch.org. |