]> git.proxmox.com Git - ovs.git/blob - DESIGN.rst
ovsschema: Add protected column to Port table
[ovs.git] / DESIGN.rst
1 ..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24 ================================
25 Design Decisions In Open vSwitch
26 ================================
27
28 This document describes design decisions that went into implementing Open
29 vSwitch. While we believe these to be reasonable decisions, it is impossible
30 to predict how Open vSwitch will be used in all environments. Understanding
31 assumptions made by Open vSwitch is critical to a successful deployment. The
32 end of this document contains contact information that can be used to let us
33 know how we can make Open vSwitch more generally useful.
34
35 Asynchronous Messages
36 ---------------------
37
38 Over time, Open vSwitch has added many knobs that control whether a given
39 controller receives OpenFlow asynchronous messages. This section describes how
40 all of these features interact.
41
42 First, a service controller never receives any asynchronous messages unless it
43 changes its miss_send_len from the service controller default of zero in one of
44 the following ways:
45
46 - Sending an ``OFPT_SET_CONFIG`` message with nonzero ``miss_send_len``.
47
48 - Sending any ``NXT_SET_ASYNC_CONFIG`` message: as a side effect, this message
49 changes the ``miss_send_len`` to ``OFP_DEFAULT_MISS_SEND_LEN`` (128) for
50 service controllers.
51
52 Second, ``OFPT_FLOW_REMOVED`` and ``NXT_FLOW_REMOVED`` messages are generated
53 only if the flow that was removed had the ``OFPFF_SEND_FLOW_REM`` flag set.
54
55 Third, ``OFPT_PACKET_IN`` and ``NXT_PACKET_IN`` messages are sent only to
56 OpenFlow controller connections that have the correct connection ID (see
57 ``struct nx_controller_id`` and ``struct nx_action_controller``):
58
59 - For packet-in messages generated by a ``NXAST_CONTROLLER`` action, the
60 controller ID specified in the action.
61
62 - For other packet-in messages, controller ID zero. (This is the default ID
63 when an OpenFlow controller does not configure one.)
64
65 Finally, Open vSwitch consults a per-connection table indexed by the message
66 type, reason code, and current role. The following table shows how this table
67 is initialized by default when an OpenFlow connection is made. An entry
68 labeled ``yes`` means that the message is sent, an entry labeled ``---`` means
69 that the message is suppressed.
70
71 .. table:: ``OFPT_PACKET_IN`` / ``NXT_PACKET_IN``
72
73 =========================================== ======= =====
74 master/
75 message and reason code other slave
76 =========================================== ======= =====
77 ``OFPR_NO_MATCH`` yes ---
78 ``OFPR_ACTION`` yes ---
79 ``OFPR_INVALID_TTL`` --- ---
80 ``OFPR_ACTION_SET`` (OF1.4+) yes ---
81 ``OFPR_GROUP`` (OF1.4+) yes ---
82 =========================================== ======= =====
83
84 .. table:: ``OFPT_FLOW_REMOVED`` / ``NXT_FLOW_REMOVED``
85
86 =========================================== ======= =====
87 master/
88 message and reason code other slave
89 =========================================== ======= =====
90 ``OFPRR_IDLE_TIMEOUT`` yes ---
91 ``OFPRR_HARD_TIMEOUT`` yes ---
92 ``OFPRR_DELETE`` yes ---
93 ``OFPRR_GROUP_DELETE`` (OF1.4+) yes ---
94 ``OFPRR_METER_DELETE`` (OF1.4+) yes ---
95 ``OFPRR_EVICTION`` (OF1.4+) yes ---
96 =========================================== ======= =====
97
98 .. table:: ``OFPT_PORT_STATUS``
99
100 =========================================== ======= =====
101 master/
102 message and reason code other slave
103 =========================================== ======= =====
104 ``OFPPR_ADD`` yes yes
105 ``OFPPR_DELETE`` yes yes
106 ``OFPPR_MODIFY`` yes yes
107 =========================================== ======= =====
108
109 .. table:: ``OFPT_ROLE_REQUEST`` / ``OFPT_ROLE_REPLY`` (OF1.4+)
110
111 =========================================== ======= =====
112 master/
113 message and reason code other slave
114 =========================================== ======= =====
115 ``OFPCRR_MASTER_REQUEST`` --- ---
116 ``OFPCRR_CONFIG`` --- ---
117 ``OFPCRR_EXPERIMENTER`` --- ---
118 =========================================== ======= =====
119
120 .. table:: ``OFPT_TABLE_STATUS`` (OF1.4+)
121
122 =========================================== ======= =====
123 master/
124 message and reason code other slave
125 =========================================== ======= =====
126 ``OFPTR_VACANCY_DOWN`` --- ---
127 ``OFPTR_VACANCY_UP`` --- ---
128 =========================================== ======= =====
129
130
131 .. table:: ``OFPT_REQUESTFORWARD`` (OF1.4+)
132
133 =========================================== ======= =====
134 master/
135 message and reason code other slave
136 =========================================== ======= =====
137 ``OFPRFR_GROUP_MOD`` --- ---
138 ``OFPRFR_METER_MOD`` --- ---
139 =========================================== ======= =====
140
141 The ``NXT_SET_ASYNC_CONFIG`` message directly sets all of the values in this
142 table for the current connection. The ``OFPC_INVALID_TTL_TO_CONTROLLER`` bit
143 in the ``OFPT_SET_CONFIG`` message controls the setting for
144 ``OFPR_INVALID_TTL`` for the "master" role.
145
146 ``OFPAT_ENQUEUE``
147 -----------------
148
149 The OpenFlow 1.0 specification requires the output port of the
150 ``OFPAT_ENQUEUE`` action to "refer to a valid physical port (i.e. <
151 ``OFPP_MAX``) or ``OFPP_IN_PORT``". Although ``OFPP_LOCAL`` is not less than
152 ``OFPP_MAX``, it is an 'internal' port which can have QoS applied to it in
153 Linux. Since we allow the ``OFPAT_ENQUEUE`` to apply to 'internal' ports whose
154 port numbers are less than ``OFPP_MAX``, we interpret ``OFPP_LOCAL`` as a
155 physical port and support ``OFPAT_ENQUEUE`` on it as well.
156
157 ``OFPT_FLOW_MOD``
158 -----------------
159
160 The OpenFlow specification for the behavior of ``OFPT_FLOW_MOD`` is confusing.
161 The following tables summarize the Open vSwitch implementation of its behavior
162 in the following categories:
163
164 "match on priority"
165 Whether the ``flow_mod`` acts only on flows whose priority matches that
166 included in the ``flow_mod`` message.
167
168 "match on out_port"
169 Whether the ``flow_mod`` acts only on flows that output to the out_port
170 included in the flow_mod message (if out_port is not ``OFPP_NONE``).
171 OpenFlow 1.1 and later have a similar feature (not listed separately here)
172 for ``out_group``.
173
174 "match on flow_cookie":
175 Whether the ``flow_mod`` acts only on flows whose ``flow_cookie`` matches an
176 optional controller-specified value and mask.
177
178 "updates flow_cookie":
179 Whether the ``flow_mod`` changes the ``flow_cookie`` of the flow or flows
180 that it matches to the ``flow_cookie`` included in the flow_mod message.
181
182 "updates ``OFPFF_`` flags":
183 Whether the flow_mod changes the ``OFPFF_SEND_FLOW_REM`` flag of the flow or
184 flows that it matches to the setting included in the flags of the flow_mod
185 message.
186
187 "honors ``OFPFF_CHECK_OVERLAP``":
188 Whether the ``OFPFF_CHECK_OVERLAP`` flag in the flow_mod is significant.
189
190 "updates ``idle_timeout``" and "updates ``hard_timeout``":
191 Whether the ``idle_timeout`` and hard_timeout in the ``flow_mod``,
192 respectively, have an effect on the flow or flows matched by the
193 ``flow_mod``.
194
195 "updates idle timer":
196 Whether the ``flow_mod`` resets the per-flow timer that measures how long a
197 flow has been idle.
198
199 "updates hard timer":
200 Whether the ``flow_mod`` resets the per-flow timer that measures how long it
201 has been since a flow was modified.
202
203 "zeros counters":
204 Whether the ``flow_mod`` resets per-flow packet and byte counters to zero.
205
206 "may add a new flow":
207 Whether the ``flow_mod`` may add a new flow to the flow table. (Obviously
208 this is always true for "add" commands but in some OpenFlow versions "modify"
209 and "modify-strict" can also add new flows.)
210
211 "sends ``flow_removed`` message":
212 Whether the flow_mod generates a flow_removed message for the flow or flows
213 that it affects.
214
215 An entry labeled ``yes`` means that the flow mod type does have the indicated
216 behavior, ``---`` means that it does not, an empty cell means that the property
217 is not applicable, and other values are explained below the table.
218
219 OpenFlow 1.0
220 ~~~~~~~~~~~~
221
222 ================================ === ====== ====== ====== ======
223 MODIFY DELETE
224 RULE ADD MODIFY STRICT DELETE STRICT
225 ================================ === ====== ====== ====== ======
226 match on ``priority`` yes --- yes --- yes
227 match on ``out_port`` --- --- --- yes yes
228 match on ``flow_cookie`` --- --- --- --- ---
229 match on ``table_id`` --- --- --- --- ---
230 controller chooses ``table_id`` --- --- ---
231 updates ``flow_cookie`` yes yes yes
232 updates ``OFPFF_SEND_FLOW_REM`` yes + +
233 honors ``OFPFF_CHECK_OVERLAP`` yes + +
234 updates ``idle_timeout`` yes + +
235 updates ``hard_timeout`` yes + +
236 resets idle timer yes + +
237 resets hard timer yes yes yes
238 zeros counters yes + +
239 may add a new flow yes yes yes
240 sends ``flow_removed`` message --- --- --- % %
241 ================================ === ====== ====== ====== ======
242
243 where:
244
245 ``+``
246 "modify" and "modify-strict" only take these actions when they create a new
247 flow, not when they update an existing flow.
248
249 ``%``
250 "delete" and "delete_strict" generates a flow_removed message if the deleted
251 flow or flows have the ``OFPFF_SEND_FLOW_REM`` flag set. (Each controller
252 can separately control whether it wants to receive the generated messages.)
253
254 OpenFlow 1.1
255 ~~~~~~~~~~~~
256
257 OpenFlow 1.1 makes these changes:
258
259 - The controller now must specify the ``table_id`` of the flow match searched
260 and into which a flow may be inserted. Behavior for a ``table_id`` of 255 is
261 undefined.
262
263 - A ``flow_mod``, except an "add", can now match on the ``flow_cookie``.
264
265 - When a ``flow_mod`` matches on the ``flow_cookie``, "modify" and
266 "modify-strict" never insert a new flow.
267
268 ================================ === ====== ====== ====== ======
269 MODIFY DELETE
270 RULE ADD MODIFY STRICT DELETE STRICT
271 ================================ === ====== ====== ====== ======
272 match on ``priority`` yes --- yes --- yes
273 match on ``out_port`` --- --- --- yes yes
274 match on ``flow_cookie`` --- yes yes yes yes
275 match on ``table_id`` yes yes yes yes yes
276 controller chooses ``table_id`` yes yes yes
277 updates ``flow_cookie`` yes --- ---
278 updates ``OFPFF_SEND_FLOW_REM`` yes + +
279 honors ``OFPFF_CHECK_OVERLAP`` yes + +
280 updates ``idle_timeout`` yes + +
281 updates ``hard_timeout`` yes + +
282 resets idle timer yes + +
283 resets hard timer yes yes yes
284 zeros counters yes + +
285 may add a new flow yes # #
286 sends ``flow_removed`` message --- --- --- % %
287 ================================ === ====== ====== ====== ======
288
289 where:
290
291 ``+``
292 "modify" and "modify-strict" only take these actions when they create a new
293 flow, not when they update an existing flow.
294
295 ``%``
296 "delete" and "delete_strict" generates a flow_removed message if the deleted
297 flow or flows have the ``OFPFF_SEND_FLOW_REM`` flag set. (Each controller
298 can separately control whether it wants to receive the generated messages.)
299
300 ``#``
301 "modify" and "modify-strict" only add a new flow if the flow_mod does not
302 match on any bits of the flow cookie
303
304 OpenFlow 1.2
305 ~~~~~~~~~~~~
306
307 OpenFlow 1.2 makes these changes:
308
309 - Only "add" commands ever add flows, "modify" and "modify-strict" never do.
310
311 - A new flag ``OFPFF_RESET_COUNTS`` now controls whether "modify" and
312 "modify-strict" reset counters, whereas previously they never reset counters
313 (except when they inserted a new flow).
314
315 ================================ === ====== ====== ====== ======
316 MODIFY DELETE
317 RULE ADD MODIFY STRICT DELETE STRICT
318 ================================ === ====== ====== ====== ======
319 match on ``priority`` yes --- yes --- yes
320 match on ``out_port`` --- --- --- yes yes
321 match on ``flow_cookie`` --- yes yes yes yes
322 match on ``table_id`` yes yes yes yes yes
323 controller chooses ``table_id`` yes yes yes
324 updates ``flow_cookie`` yes --- ---
325 updates ``OFPFF_SEND_FLOW_REM`` yes --- ---
326 honors ``OFPFF_CHECK_OVERLAP`` yes --- ---
327 updates ``idle_timeout`` yes --- ---
328 updates ``hard_timeout`` yes --- ---
329 resets idle timer yes --- ---
330 resets hard timer yes yes yes
331 zeros counters yes & &
332 may add a new flow yes --- ---
333 sends ``flow_removed`` message --- --- --- % %
334 ================================ === ====== ====== ====== ======
335
336 ``%``
337 "delete" and "delete_strict" generates a flow_removed message if the deleted
338 flow or flows have the ``OFPFF_SEND_FLOW_REM`` flag set. (Each controller
339 can separately control whether it wants to receive the generated messages.)
340
341 ``&``
342 "modify" and "modify-strict" reset counters if the ``OFPFF_RESET_COUNTS``
343 flag is specified.
344
345 OpenFlow 1.3
346 ~~~~~~~~~~~~
347
348 OpenFlow 1.3 makes these changes:
349
350 - Behavior for a table_id of 255 is now defined, for "delete" and
351 "delete-strict" commands, as meaning to delete from all tables. A table_id
352 of 255 is now explicitly invalid for other commands.
353
354 - New flags ``OFPFF_NO_PKT_COUNTS`` and ``OFPFF_NO_BYT_COUNTS`` for "add"
355 operations.
356
357 The table for 1.3 is the same as the one shown above for 1.2.
358
359 OpenFlow 1.4
360 ~~~~~~~~~~~~
361
362 OpenFlow 1.4 makes these changes:
363
364 - Adds the "importance" field to ``flow_mods``, but it does not explicitly
365 specify which kinds of ``flow_mods`` set the importance. For consistency,
366 Open vSwitch uses the same rule for importance as for ``idle_timeout`` and
367 ``hard_timeout``, that is, only an "ADD" flow_mod sets the importance. (This
368 issue has been filed with the ONF as EXT-496.)
369
370 .. TODO(stephenfin) Link to EXT-496
371
372 - Eviction Mechanism to automatically delete entries of lower importance to
373 make space for newer entries.
374
375 OpenFlow 1.4 Bundles
376 --------------------
377
378 Open vSwitch makes all flow table modifications atomically, i.e., any datapath
379 packet only sees flow table configurations either before or after any change
380 made by any ``flow_mod``. For example, if a controller removes all flows with
381 a single OpenFlow ``flow_mod``, no packet sees an intermediate version of the
382 OpenFlow pipeline where only some of the flows have been deleted.
383
384 It should be noted that Open vSwitch caches datapath flows, and that the cached
385 flows are *NOT* flushed immediately when a flow table changes. Instead, the
386 datapath flows are revalidated against the new flow table as soon as possible,
387 and usually within one second of the modification. This design amortizes the
388 cost of datapath cache flushing across multiple flow table changes, and has a
389 significant performance effect during simultaneous heavy flow table churn and
390 high traffic load. This means that different cached datapath flows may have
391 been computed based on a different flow table configurations, but each of the
392 datapath flows is guaranteed to have been computed over a coherent view of the
393 flow tables, as described above.
394
395 With OpenFlow 1.4 bundles this atomicity can be extended across an arbitrary
396 set of ``flow_mod``. Bundles are supported for ``flow_mod`` and port_mod
397 messages only. For ``flow_mod``, both ``atomic`` and ``ordered`` bundle flags
398 are trivially supported, as all bundled messages are executed in the order they
399 were added and all flow table modifications are now atomic to the datapath.
400 Port mods may not appear in atomic bundles, as port status modifications are
401 not atomic.
402
403 To support bundles, ovs-ofctl has a ``--bundle`` option that makes the
404 flow mod commands (``add-flow``, ``add-flows``, ``mod-flows``, ``del-flows``,
405 and ``replace-flows``) use an OpenFlow 1.4 bundle to operate the
406 modifications as a single atomic transaction. If any of the flow mods
407 in a transaction fail, none of them are executed. All flow mods in a
408 bundle appear to datapath lookups simultaneously.
409
410 Furthermore, ovs-ofctl ``add-flow`` and ``add-flows`` commands now accept
411 arbitrary flow mods as an input by allowing the flow specification to
412 start with an explicit ``add``, ``modify``, ``modify_strict``, ``delete``, or
413 ``delete_strict`` keyword. A missing keyword is treated as ``add``, so
414 this is fully backwards compatible. With the new ``--bundle`` option
415 all the flow mods are executed as a single atomic transaction using an
416 OpenFlow 1.4 bundle. Without the ``--bundle`` option the flow mods are
417 executed in order up to the first failing ``flow_mod``, and in case of an
418 error the earlier successful ``flow_mod`` calls are not rolled back.
419
420 ``OFPT_PACKET_IN``
421 ------------------
422
423 The OpenFlow 1.1 specification for ``OFPT_PACKET_IN`` is confusing. The
424 definition in OF1.1 ``openflow.h`` is[*]:
425
426 ::
427
428 /* Packet received on port (datapath -> controller). */
429 struct ofp_packet_in {
430 struct ofp_header header;
431 uint32_t buffer_id; /* ID assigned by datapath. */
432 uint32_t in_port; /* Port on which frame was received. */
433 uint32_t in_phy_port; /* Physical Port on which frame was received. */
434 uint16_t total_len; /* Full length of frame. */
435 uint8_t reason; /* Reason packet is being sent (one of OFPR_*) */
436 uint8_t table_id; /* ID of the table that was looked up */
437 uint8_t data[0]; /* Ethernet frame, halfway through 32-bit word,
438 so the IP header is 32-bit aligned. The
439 amount of data is inferred from the length
440 field in the header. Because of padding,
441 offsetof(struct ofp_packet_in, data) ==
442 sizeof(struct ofp_packet_in) - 2. */
443 };
444 OFP_ASSERT(sizeof(struct ofp_packet_in) == 24);
445
446 The confusing part is the comment on the ``data[]`` member. This comment is a
447 leftover from OF1.0 ``openflow.h``, in which the comment was correct:
448 ``sizeof(struct ofp_packet_in)`` is 20 in OF1.0 and ``ffsetof(struct
449 ofp_packet_in, data)`` is 18. When OF1.1 was written, the structure members
450 were changed but the comment was carelessly not updated, and the comment became
451 wrong: ``sizeof(struct ofp_packet_in)`` and offsetof(struct ofp_packet_in,
452 data) are both 24 in OF1.1.
453
454 That leaves the question of how to implement ``ofp_packet_in`` in OF1.1. The
455 OpenFlow reference implementation for OF1.1 does not include any padding, that
456 is, the first byte of the encapsulated frame immediately follows the
457 ``table_id`` member without a gap. Open vSwitch therefore implements it the
458 same way for compatibility.
459
460 For an earlier discussion, please see the thread archived at:
461 https://mailman.stanford.edu/pipermail/openflow-discuss/2011-August/002604.html
462
463 [*] The quoted definition is directly from OF1.1. Definitions used inside OVS
464 omit the 8-byte ``ofp_header`` members, so the sizes in this discussion are
465 8 bytes larger than those declared in OVS header files.
466
467 VLAN Matching
468 -------------
469
470 The 802.1Q VLAN header causes more trouble than any other 4 bytes in
471 networking. More specifically, three versions of OpenFlow and Open vSwitch
472 have among them four different ways to match the contents and presence of the
473 VLAN header. The following table describes how each version works.
474
475 ======== ============= =============== =============== ================
476 Match NXM OF1.0 OF1.1 OF1.2
477 ======== ============= =============== =============== ================
478 ``[1]`` ``0000/0000`` ``????/1,??/?`` ``????/1,??/?`` ``0000/0000,--``
479 ``[2]`` ``0000/ffff`` ``ffff/0,??/?`` ``ffff/0,??/?`` ``0000/ffff,--``
480 ``[3]`` ``1xxx/1fff`` ``0xxx/0,??/1`` ``0xxx/0,??/1`` ``1xxx/ffff,--``
481 ``[4]`` ``z000/f000`` ``????/1,0y/0`` ``fffe/0,0y/0`` ``1000/1000,0y``
482 ``[5]`` ``zxxx/ffff`` ``0xxx/0,0y/0`` ``0xxx/0,0y/0`` ``1xxx/ffff,0y``
483 ``[6]`` ``0000/0fff`` ``<none>`` ``<none>`` ``<none>``
484 ``[7]`` ``0000/f000`` ``<none>`` ``<none>`` ``<none>``
485 ``[8]`` ``0000/efff`` ``<none>`` ``<none>`` ``<none>``
486 ``[9]`` ``1001/1001`` ``<none>`` ``<none>`` ``1001/1001,--``
487 ``[10]`` ``3000/3000`` ``<none>`` ``<none>`` ``<none>``
488 ``[11]`` ``1000/1000`` ``<none>`` ``fffe/0,??/1`` ``1000/1000,--``
489 ======== ============= =============== =============== ================
490
491 where:
492
493 Match:
494 See the list below.
495
496 NXM:
497 ``xxxx/yyyy`` means ``NXM_OF_VLAN_TCI_W`` with value ``xxxx`` and mask
498 ``yyyy``. A mask of ``0000`` is equivalent to omitting
499 ``NXM_OF_VLAN_TCI(_W)``, a mask of ``ffff`` is equivalent to
500 ``NXM_OF_VLAN_TCI``.
501
502 OF1.0, OF1.1:
503 ``wwww/x,yy/z`` means ``dl_vlan`` ``wwww``, ``OFPFW_DL_VLAN`` ``x``,
504 ``dl_vlan_pcp`` ``yy``, and ``OFPFW_DL_VLAN_PCP`` ``z``. If
505 ``OFPFW_DL_VLAN`` or ``OFPFW_DL_VLAN_PCP`` is 1, the corresponding field
506 value is wildcarded, otherwise it is matched. ``?`` means that the given
507 bits are ignored (their conventional values are ``0000/x,00/0`` in OF1.0,
508 ``0000/x,00/1`` in OF1.1; ``x`` is never ignored). ``<none>`` means that the
509 given match is not supported.
510
511 OF1.2:
512 ``xxxx/yyyy,zz`` means ``OXM_OF_VLAN_VID_W`` with value ``xxxx`` and mask
513 ``yyyy``, and ``OXM_OF_VLAN_PCP`` (which is not maskable) with value ``zz``.
514 A mask of ``0000`` is equivalent to omitting ``OXM_OF_VLAN_VID(_W)``, a mask
515 of ``ffff`` is equivalent to ``OXM_OF_VLAN_VID``. ``--`` means that
516 ``OXM_OF_VLAN_PCP`` is omitted. ``<none>`` means that the given match is not
517 supported.
518
519 The matches are:
520
521 ``[1]``:
522 Matches any packet, that is, one without an 802.1Q header or with an 802.1Q
523 header with any TCI value.
524
525 ``[2]``
526 Matches only packets without an 802.1Q header.
527
528 NXM:
529 Any match with ``vlan_tci == 0`` and ``(vlan_tci_mask & 0x1000) != 0`` is
530 equivalent to the one listed in the table.
531
532 OF1.0:
533 The spec doesn't define behavior if ``dl_vlan`` is set to ``0xffff`` and
534 ``OFPFW_DL_VLAN_PCP`` is not set.
535
536 OF1.1:
537 The spec says explicitly to ignore ``dl_vlan_pcp`` when ``dl_vlan`` is set
538 to ``0xffff``.
539
540 OF1.2:
541 The spec doesn't say what should happen if ``vlan_vid == 0`` and
542 ``(vlan_vid_mask & 0x1000) != 0`` but ``vlan_vid_mask != 0x1000``, but it
543 would be straightforward to also interpret as ``[2]``.
544
545 ``[3]``
546 Matches only packets that have an 802.1Q header with VID ``xxx`` (and any
547 PCP).
548
549 ``[4]``
550 Matches only packets that have an 802.1Q header with PCP ``y`` (and any VID).
551
552 NXM:
553 ``z`` is ``(y << 1) | 1``.
554
555 OF1.0:
556 The spec isn't very clear, but OVS implements it this way.
557
558 OF1.2:
559 Presumably other masks such that ``(vlan_vid_mask & 0x1fff) == 0x1000``
560 would also work, but the spec doesn't define their behavior.
561
562 ``[5]``
563 Matches only packets that have an 802.1Q header with VID ``xxx`` and PCP
564 ``y``.
565
566 NXM:
567 ``z`` is ``((y << 1) | 1)``.
568
569 OF1.2:
570 Presumably other masks such that ``(vlan_vid_mask & 0x1fff) == 0x1fff``
571 would also work.
572
573 ``[6]``
574 Matches packets with no 802.1Q header or with an 802.1Q header with a VID of
575 0. Only possible with NXM.
576
577 ``[7]``
578 Matches packets with no 802.1Q header or with an 802.1Q header with a PCP of
579 0. Only possible with NXM.
580
581 ``[8]``
582 Matches packets with no 802.1Q header or with an 802.1Q header with both VID
583 and PCP of 0. Only possible with NXM.
584
585 ``[9]``
586 Matches only packets that have an 802.1Q header with an odd-numbered VID (and
587 any PCP). Only possible with NXM and OF1.2. (This is just an example; one
588 can match on any desired VID bit pattern.)
589
590 ``[10]``
591 Matches only packets that have an 802.1Q header with an odd-numbered PCP (and
592 any VID). Only possible with NXM. (This is just an example; one can match
593 on any desired VID bit pattern.)
594
595 ``[11]``
596 Matches any packet with an 802.1Q header, regardless of VID or PCP.
597
598 Additional notes:
599
600 OF1.2:
601 The top three bits of ``OXM_OF_VLAN_VID`` are fixed to zero, so bits 13, 14,
602 and 15 in the masks listed in the table may be set to arbitrary values, as
603 long as the corresponding value bits are also zero. The suggested ``ffff``
604 mask for [2], [3], and [5] allows a shorter OXM representation (the mask is
605 omitted) than the minimal ``1fff`` mask.
606
607 Flow Cookies
608 ------------
609
610 OpenFlow 1.0 and later versions have the concept of a "flow cookie", which is a
611 64-bit integer value attached to each flow. The treatment of the flow cookie
612 has varied greatly across OpenFlow versions, however.
613
614 In OpenFlow 1.0:
615
616 - ``OFPFC_ADD`` set the cookie in the flow that it added.
617
618 - ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` updated the cookie for the flow
619 or flows that it modified.
620
621 - ``OFPST_FLOW`` messages included the flow cookie.
622
623 - ``OFPT_FLOW_REMOVED`` messages reported the cookie of the flow that was
624 removed.
625
626 OpenFlow 1.1 made the following changes:
627
628 - Flow mod operations ``OFPFC_MODIFY``, ``OFPFC_MODIFY_STRICT``,
629 ``OFPFC_DELETE``, and ``OFPFC_DELETE_STRICT``, plus flow stats requests and
630 aggregate stats requests, gained the ability to match on flow cookies with an
631 arbitrary mask.
632
633 - ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` were changed to add a new flow,
634 in the case of no match, only if the flow table modification operation did
635 not match on the cookie field. (In OpenFlow 1.0, modify operations always
636 added a new flow when there was no match.)
637
638 - ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` no longer updated flow cookies.
639
640 OpenFlow 1.2 made the following changes:
641
642 - ``OFPC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` were changed to never add a new
643 flow, regardless of whether the flow cookie was used for matching.
644
645 Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0 behavior with
646 the following extensions:
647
648 - An NXM extension field ``NXM_NX_COOKIE(_W)`` allows the NXM versions of
649 ``OFPFC_MODIFY``, ``OFPFC_MODIFY_STRICT``, ``OFPFC_DELETE``, and
650 ``OFPFC_DELETE_STRICT`` ``flow_mod`` calls, plus flow stats requests and
651 aggregate stats requests, to match on flow cookies with arbitrary masks.
652 This is much like the equivalent OpenFlow 1.1 feature.
653
654 - Like OpenFlow 1.1, ``OFPC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` add a new flow
655 if there is no match and the mask is zero (or not given).
656
657 - The ``cookie`` field in ``OFPT_FLOW_MOD`` and ``NXT_FLOW_MOD`` messages is
658 used as the cookie value for ``OFPFC_ADD`` commands, as described in OpenFlow
659 1.0. For ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` commands, the
660 ``cookie`` field is used as a new cookie for flows that match unless it is
661 ``UINT64_MAX``, in which case the flow's cookie is not updated.
662
663 - ``NXT_PACKET_IN`` (the Nicira extended version of ``OFPT_PACKET_IN``) reports
664 the cookie of the rule that generated the packet, or all-1-bits if no rule
665 generated the packet. (Older versions of OVS used all-0-bits instead of
666 all-1-bits.)
667
668 The following table shows the handling of different protocols when receiving
669 ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` messages. A mask of 0 indicates
670 either an explicit mask of zero or an implicit one by not specifying the
671 ``NXM_NX_COOKIE(_W)`` field.
672
673 ============== ====== ====== ============= =============
674 Match Update Add on miss Add on miss
675 cookie cookie mask!=0 mask==0
676 ============== ====== ====== ============= =============
677 OpenFlow 1.0 no yes (add on miss) (add on miss)
678 OpenFlow 1.1 yes no no yes
679 OpenFlow 1.2 yes no no no
680 NXM yes yes\* no yes
681 ============== ====== ====== ============= =============
682
683 \* Updates the flow's cookie unless the ``cookie`` field is ``UINT64_MAX``.
684
685 Multiple Table Support
686 ----------------------
687
688 OpenFlow 1.0 has only rudimentary support for multiple flow tables. Notably,
689 OpenFlow 1.0 does not allow the controller to specify the flow table to which a
690 flow is to be added. Open vSwitch adds an extension for this purpose, which is
691 enabled on a per-OpenFlow connection basis using the ``NXT_FLOW_MOD_TABLE_ID``
692 message. When the extension is enabled, the upper 8 bits of the ``command``
693 member in an ``OFPT_FLOW_MOD`` or ``NXT_FLOW_MOD`` message designates the table
694 to which a flow is to be added.
695
696 The Open vSwitch software switch implementation offers 255 flow tables. On
697 packet ingress, only the first flow table (table 0) is searched, and the
698 contents of the remaining tables are not considered in any way. Tables other
699 than table 0 only come into play when an ``NXAST_RESUBMIT_TABLE`` action
700 specifies another table to search.
701
702 Tables 128 and above are reserved for use by the switch itself. Controllers
703 should use only tables 0 through 127.
704
705 ``OFPTC_*`` Table Configuration
706 -------------------------------
707
708 This section covers the history of the ``OFPTC_*`` table configuration bits
709 across OpenFlow versions.
710
711 OpenFlow 1.0 flow tables had fixed configurations.
712
713 OpenFlow 1.1 enabled controllers to configure behavior upon flow table miss and
714 added the ``OFPTC_MISS_*`` constants for that purpose. ``OFPTC_*`` did not
715 control anything else but it was nevertheless conceptualized as a set of
716 bit-fields instead of an enum. OF1.1 added the ``OFPT_TABLE_MOD`` message to
717 set ``OFPTC_MISS_*`` for a flow table and added the ``config`` field to the
718 ``OFPST_TABLE`` reply to report the current setting.
719
720 OpenFlow 1.2 did not change anything in this regard.
721
722 OpenFlow 1.3 switched to another means to changing flow table miss behavior and
723 deprecated ``OFPTC_MISS_*`` without adding any more ``OFPTC_*`` constants.
724 This meant that ``OFPT_TABLE_MOD`` now had no purpose at all, but OF1.3 kept it
725 around "for backward compatibility with older and newer versions of the
726 specification." At the same time, OF1.3 introduced a new message
727 OFPMP_TABLE_FEATURES that included a field ``config`` documented as reporting
728 the ``OFPTC_*`` values set with ``OFPT_TABLE_MOD``; of course this served no
729 real purpose because no ``OFPTC_*`` values are defined. OF1.3 did remove the
730 ``OFPTC_*`` field from ``OFPMP_TABLE`` (previously named ``OFPST_TABLE``).
731
732 OpenFlow 1.4 defined two new ``OFPTC_*`` constants, ``OFPTC_EVICTION`` and
733 ``OFPTC_VACANCY_EVENTS``, using bits that did not overlap with ``OFPTC_MISS_*``
734 even though those bits had not been defined since OF1.2. ``OFPT_TABLE_MOD``
735 still controlled these settings. The field for ``OFPTC_*`` values in
736 ``OFPMP_TABLE_FEATURES`` was renamed from ``config`` to ``capabilities`` and
737 documented as reporting the flags that are supported in a ``OFPT_TABLE_MOD``
738 message. The ``OFPMP_TABLE_DESC`` message newly added in OF1.4 reported the
739 ``OFPTC_*`` setting.
740
741 OpenFlow 1.5 did not change anything in this regard.
742
743 .. list-table:: Revisions
744 :header-rows: 1
745
746 * - OpenFlow
747 - ``OFPTC_*`` flags
748 - ``TABLE_MOD``
749 - Statistics
750 - ``TABLE_FEATURES``
751 - ``TABLE_DESC``
752 * - OF1.0
753 - none
754 - no (\*)(+)
755 - no (\*)
756 - nothing (\*)(+)
757 - no (\*)(+)
758 * - OF1.1/1.2
759 - ``MISS_*``
760 - yes
761 - yes
762 - nothing (+)
763 - no (+)
764 * - OF1.3
765 - none
766 - yes (\*)
767 - no (\*)
768 - config (\*)
769 - no (\*)(+)
770 * - OF1.4/1.5
771 - ``EVICTION``/``VACANCY_EVENTS``
772 - yes
773 - no
774 - capabilities
775 - yes
776
777 where:
778
779 OpenFlow:
780 The OpenFlow version(s).
781
782 ``OFPTC_*`` flags:
783 The ``OFPTC_*`` flags defined in those versions.
784
785 ``TABLE_MOD``:
786 Whether ``OFPT_TABLE_MOD`` can modify ``OFPTC_*`` flags.
787
788 Statistics:
789 Whether ``OFPST_TABLE/OFPMP_TABLE`` reports the ``OFPTC_*`` flags.
790
791 ``TABLE_FEATURES``:
792 What ``OFPMP_TABLE_FEATURES`` reports (if it exists): either the current
793 configuration or the switch's capabilities.
794
795 ``TABLE_DESC``:
796 Whether ``OFPMP_TABLE_DESC`` reports the current configuration.
797
798 (\*): Nothing to report/change anyway.
799
800 (+): No such message.
801
802 IPv6
803 ----
804
805 Open vSwitch supports stateless handling of IPv6 packets. Flows can be written
806 to support matching TCP, UDP, and ICMPv6 headers within an IPv6 packet. Deeper
807 matching of some Neighbor Discovery messages is also supported.
808
809 IPv6 was not designed to interact well with middle-boxes. This, combined with
810 Open vSwitch's stateless nature, have affected the processing of IPv6 traffic,
811 which is detailed below.
812
813 Extension Headers
814 ~~~~~~~~~~~~~~~~~
815
816 The base IPv6 header is incredibly simple with the intention of only containing
817 information relevant for routing packets between two endpoints. IPv6 relies
818 heavily on the use of extension headers to provide any other functionality.
819 Unfortunately, the extension headers were designed in such a way that it is
820 impossible to move to the next header (including the layer-4 payload) unless
821 the current header is understood.
822
823 Open vSwitch will process the following extension headers and continue to the
824 next header:
825
826 - Fragment (see the next section)
827 - AH (Authentication Header)
828 - Hop-by-Hop Options
829 - Routing
830 - Destination Options
831
832 When a header is encountered that is not in that list, it is considered
833 "terminal". A terminal header's IPv6 protocol value is stored in ``nw_proto``
834 for matching purposes. If a terminal header is TCP, UDP, or ICMPv6, the packet
835 will be further processed in an attempt to extract layer-4 information.
836
837 Fragments
838 ~~~~~~~~~
839
840 IPv6 requires that every link in the internet have an MTU of 1280 octets or
841 greater (RFC 2460). As such, a terminal header (as described above in
842 "Extension Headers") in the first fragment should generally be reachable. In
843 this case, the terminal header's IPv6 protocol type is stored in the
844 ``nw_proto`` field for matching purposes. If a terminal header cannot be found
845 in the first fragment (one with a fragment offset of zero), the ``nw_proto``
846 field is set to 0. Subsequent fragments (those with a non-zero fragment
847 offset) have the ``nw_proto`` field set to the IPv6 protocol type for fragments
848 (44).
849
850 Jumbograms
851 ~~~~~~~~~~
852
853 An IPv6 jumbogram (RFC 2675) is a packet containing a payload longer than
854 65,535 octets. A jumbogram is only relevant in subnets with a link MTU greater
855 than 65,575 octets, and are not required to be supported on nodes that do not
856 connect to link with such large MTUs. Currently, Open vSwitch doesn't process
857 jumbograms.
858
859 In-Band Control
860 ---------------
861
862 Motivation
863 ~~~~~~~~~~
864
865 An OpenFlow switch must establish and maintain a TCP network connection to its
866 controller. There are two basic ways to categorize the network that this
867 connection traverses: either it is completely separate from the one that the
868 switch is otherwise controlling, or its path may overlap the network that the
869 switch controls. We call the former case "out-of-band control", the latter
870 case "in-band control".
871
872 Out-of-band control has the following benefits:
873
874 - Simplicity: Out-of-band control slightly simplifies the switch
875 implementation.
876
877 - Reliability: Excessive switch traffic volume cannot interfere with control
878 traffic.
879
880 - Integrity: Machines not on the control network cannot impersonate a switch or
881 a controller.
882
883 - Confidentiality: Machines not on the control network cannot snoop on control
884 traffic.
885
886 In-band control, on the other hand, has the following advantages:
887
888 - No dedicated port: There is no need to dedicate a physical switch port to
889 control, which is important on switches that have few ports (e.g. wireless
890 routers, low-end embedded platforms).
891
892 - No dedicated network: There is no need to build and maintain a separate
893 control network. This is important in many environments because it reduces
894 proliferation of switches and wiring.
895
896 Open vSwitch supports both out-of-band and in-band control. This section
897 describes the principles behind in-band control. See the description of the
898 Controller table in ovs-vswitchd.conf.db(5) to configure OVS for in-band
899 control.
900
901 Principles
902 ~~~~~~~~~~
903
904 The fundamental principle of in-band control is that an OpenFlow switch must
905 recognize and switch control traffic without involving the OpenFlow controller.
906 All the details of implementing in-band control are special cases of this
907 principle.
908
909 The rationale for this principle is simple. If the switch does not handle
910 in-band control traffic itself, then it will be caught in a contradiction: it
911 must contact the controller, but it cannot, because only the controller can set
912 up the flows that are needed to contact the controller.
913
914 The following points describe important special cases of this principle.
915
916 - In-band control must be implemented regardless of whether the switch is
917 connected.
918
919 It is tempting to implement the in-band control rules only when the switch is
920 not connected to the controller, using the reasoning that the controller
921 should have complete control once it has established a connection with the
922 switch.
923
924 This does not work in practice. Consider the case where the switch is
925 connected to the controller. Occasionally it can happen that the controller
926 forgets or otherwise needs to obtain the MAC address of the switch. To do
927 so, the controller sends a broadcast ARP request. A switch that implements
928 the in-band control rules only when it is disconnected will then send an
929 ``OFPT_PACKET_IN`` message up to the controller. The controller will be
930 unable to respond, because it does not know the MAC address of the switch.
931 This is a deadlock situation that can only be resolved by the switch noticing
932 that its connection to the controller has hung and reconnecting.
933
934 - In-band control must override flows set up by the controller.
935
936 It is reasonable to assume that flows set up by the OpenFlow controller
937 should take precedence over in-band control, on the basis that the controller
938 should be in charge of the switch.
939
940 Again, this does not work in practice. Reasonable controller implementations
941 may set up a "last resort" fallback rule that wildcards every field and,
942 e.g., sends it up to the controller or discards it. If a controller does
943 that, then it will isolate itself from the switch.
944
945 - The switch must recognize all control traffic.
946
947 The fundamental principle of in-band control states, in part, that a switch
948 must recognize control traffic without involving the OpenFlow controller.
949 More specifically, the switch must recognize *all* control traffic. "False
950 negatives", that is, packets that constitute control traffic but that the
951 switch does not recognize as control traffic, lead to control traffic storms.
952
953 Consider an OpenFlow switch that only recognizes control packets sent to or
954 from that switch. Now suppose that two switches of this type, named A and B,
955 are connected to ports on an Ethernet hub (not a switch) and that an OpenFlow
956 controller is connected to a third hub port. In this setup, control traffic
957 sent by switch A will be seen by switch B, which will send it to the
958 controller as part of an OFPT_PACKET_IN message. Switch A will then see the
959 OFPT_PACKET_IN message's packet, re-encapsulate it in another OFPT_PACKET_IN,
960 and send it to the controller. Switch B will then see that OFPT_PACKET_IN,
961 and so on in an infinite loop.
962
963 Incidentally, the consequences of "false positives", where packets that are
964 not control traffic are nevertheless recognized as control traffic, are much
965 less severe. The controller will not be able to control their behavior, but
966 the network will remain in working order. False positives do constitute a
967 security problem.
968
969 - The switch should use echo-requests to detect disconnection.
970
971 TCP will notice that a connection has hung, but this can take a considerable
972 amount of time. For example, with default settings the Linux kernel TCP
973 implementation will retransmit for between 13 and 30 minutes, depending on
974 the connection's retransmission timeout, according to kernel documentation.
975 This is far too long for a switch to be disconnected, so an OpenFlow switch
976 should implement its own connection timeout. OpenFlow ``OFPT_ECHO_REQUEST``
977 messages are the best way to do this, since they test the OpenFlow connection
978 itself.
979
980 Implementation
981 ~~~~~~~~~~~~~~
982
983 This section describes how Open vSwitch implements in-band control. Correctly
984 implementing in-band control has proven difficult due to its many subtleties,
985 and has thus gone through many iterations. Please read through and understand
986 the reasoning behind the chosen rules before making modifications.
987
988 Open vSwitch implements in-band control as "hidden" flows, that is, flows that
989 are not visible through OpenFlow, and at a higher priority than wildcarded
990 flows can be set up through OpenFlow. This is done so that the OpenFlow
991 controller cannot interfere with them and possibly break connectivity with its
992 switches. It is possible to see all flows, including in-band ones, with the
993 ovs-appctl "bridge/dump-flows" command.
994
995 The Open vSwitch implementation of in-band control can hide traffic to
996 arbitrary "remotes", where each remote is one TCP port on one IP address.
997 Currently the remotes are automatically configured as the in-band OpenFlow
998 controllers plus the OVSDB managers, if any. (The latter is a requirement
999 because OVSDB managers are responsible for configuring OpenFlow controllers, so
1000 if the manager cannot be reached then OpenFlow cannot be reconfigured.)
1001
1002 The following rules (with the OFPP_NORMAL action) are set up on any bridge that
1003 has any remotes:
1004
1005 (a)
1006 DHCP requests sent from the local port.
1007 (b)
1008 ARP replies to the local port's MAC address.
1009 (c)
1010 ARP requests from the local port's MAC address.
1011
1012 In-band also sets up the following rules for each unique next-hop MAC address
1013 for the remotes' IPs (the "next hop" is either the remote itself, if it is on a
1014 local subnet, or the gateway to reach the remote):
1015
1016 (d)
1017 ARP replies to the next hop's MAC address.
1018 (e)
1019 ARP requests from the next hop's MAC address.
1020
1021 In-band also sets up the following rules for each unique remote IP address:
1022
1023 (f)
1024 ARP replies containing the remote's IP address as a target.
1025 (g)
1026 ARP requests containing the remote's IP address as a source.
1027
1028 In-band also sets up the following rules for each unique remote (IP,port) pair:
1029
1030 (h)
1031 TCP traffic to the remote's IP and port.
1032 (i)
1033 TCP traffic from the remote's IP and port.
1034
1035 The goal of these rules is to be as narrow as possible to allow a switch to
1036 join a network and be able to communicate with the remotes. As mentioned
1037 earlier, these rules have higher priority than the controller's rules, so if
1038 they are too broad, they may prevent the controller from implementing its
1039 policy. As such, in-band actively monitors some aspects of flow and packet
1040 processing so that the rules can be made more precise.
1041
1042 In-band control monitors attempts to add flows into the datapath that could
1043 interfere with its duties. The datapath only allows exact match entries, so
1044 in-band control is able to be very precise about the flows it prevents. Flows
1045 that miss in the datapath are sent to userspace to be processed, so preventing
1046 these flows from being cached in the "fast path" does not affect correctness.
1047 The only type of flow that is currently prevented is one that would prevent
1048 DHCP replies from being seen by the local port. For example, a rule that
1049 forwarded all DHCP traffic to the controller would not be allowed, but one that
1050 forwarded to all ports (including the local port) would.
1051
1052 As mentioned earlier, packets that miss in the datapath are sent to the
1053 userspace for processing. The userspace has its own flow table, the
1054 "classifier", so in-band checks whether any special processing is needed before
1055 the classifier is consulted. If a packet is a DHCP response to a request from
1056 the local port, the packet is forwarded to the local port, regardless of the
1057 flow table. Note that this requires L7 processing of DHCP replies to determine
1058 whether the 'chaddr' field matches the MAC address of the local port.
1059
1060 It is interesting to note that for an L3-based in-band control mechanism, the
1061 majority of rules are devoted to ARP traffic. At first glance, some of these
1062 rules appear redundant. However, each serves an important role. First, in
1063 order to determine the MAC address of the remote side (controller or gateway)
1064 for other ARP rules, we must allow ARP traffic for our local port with rules
1065 (b) and (c). If we are between a switch and its connection to the remote, we
1066 have to allow the other switch's ARP traffic to through. This is done with
1067 rules (d) and (e), since we do not know the addresses of the other switches a
1068 priori, but do know the remote's or gateway's. Finally, if the remote is
1069 running in a local guest VM that is not reached through the local port, the
1070 switch that is connected to the VM must allow ARP traffic based on the remote's
1071 IP address, since it will not know the MAC address of the local port that is
1072 sending the traffic or the MAC address of the remote in the guest VM.
1073
1074 With a few notable exceptions below, in-band should work in most network
1075 setups. The following are considered "supported" in the current
1076 implementation:
1077
1078 - Locally Connected. The switch and remote are on the same subnet. This uses
1079 rules (a), (b), (c), (h), and (i).
1080
1081 - Reached through Gateway. The switch and remote are on different subnets and
1082 must go through a gateway. This uses rules (a), (b), (c), (h), and (i).
1083
1084 - Between Switch and Remote. This switch is between another switch and the
1085 remote, and we want to allow the other switch's traffic through. This uses
1086 rules (d), (e), (h), and (i). It uses (b) and (c) indirectly in order to
1087 know the MAC address for rules (d) and (e). Note that DHCP for the other
1088 switch will not work unless an OpenFlow controller explicitly lets this
1089 switch pass the traffic.
1090
1091 - Between Switch and Gateway. This switch is between another switch and the
1092 gateway, and we want to allow the other switch's traffic through. This uses
1093 the same rules and logic as the "Between Switch and Remote" configuration
1094 described earlier.
1095
1096 - Remote on Local VM. The remote is a guest VM on the system running in-band
1097 control. This uses rules (a), (b), (c), (h), and (i).
1098
1099 - Remote on Local VM with Different Networks. The remote is a guest VM on the
1100 system running in-band control, but the local port is not used to connect to
1101 the remote. For example, an IP address is configured on eth0 of the switch.
1102 The remote's VM is connected through eth1 of the switch, but an IP address
1103 has not been configured for that port on the switch. As such, the switch
1104 will use eth0 to connect to the remote, and eth1's rules about the local port
1105 will not work. In the example, the switch attached to eth0 would use rules
1106 (a), (b), (c), (h), and (i) on eth0. The switch attached to eth1 would use
1107 rules (f), (g), (h), and (i).
1108
1109 The following are explicitly *not* supported by in-band control:
1110
1111 - Specify Remote by Name. Currently, the remote must be identified by IP
1112 address. A naive approach would be to permit all DNS traffic.
1113 Unfortunately, this would prevent the controller from defining any policy
1114 over DNS. Since switches that are located behind us need to connect to the
1115 remote, in-band cannot simply add a rule that allows DNS traffic from the
1116 local port. The "correct" way to support this is to parse DNS requests to
1117 allow all traffic related to a request for the remote's name through. Due to
1118 the potential security problems and amount of processing, we decided to hold
1119 off for the time-being.
1120
1121 - Differing Remotes for Switches. All switches must know the L3 addresses for
1122 all the remotes that other switches may use, since rules need to be set up to
1123 allow traffic related to those remotes through. See rules (f), (g), (h), and
1124 (i).
1125
1126 - Differing Routes for Switches. In order for the switch to allow other
1127 switches to connect to a remote through a gateway, it allows the gateway's
1128 traffic through with rules (d) and (e). If the routes to the remote differ
1129 for the two switches, we will not know the MAC address of the alternate
1130 gateway.
1131
1132 Action Reproduction
1133 -------------------
1134
1135 It seems likely that many controllers, at least at startup, use the OpenFlow
1136 "flow statistics" request to obtain existing flows, then compare the flows'
1137 actions against the actions that they expect to find. Before version 1.8.0,
1138 Open vSwitch always returned exact, byte-for-byte copies of the actions that
1139 had been added to the flow table. The current version of Open vSwitch does not
1140 always do this in some exceptional cases. This section lists the exceptions
1141 that controller authors must keep in mind if they compare actual actions
1142 against desired actions in a bytewise fashion:
1143
1144 - Open vSwitch zeros padding bytes in action structures, regardless of their
1145 values when the flows were added.
1146
1147 - Open vSwitch "normalizes" the instructions in OpenFlow 1.1 (and later) in the
1148 following way:
1149
1150 * OVS sorts the instructions into the following order: Apply-Actions,
1151 Clear-Actions, Write-Actions, Write-Metadata, Goto-Table.
1152
1153 * OVS drops Apply-Actions instructions that have empty action lists.
1154
1155 * OVS drops Write-Actions instructions that have empty action sets.
1156
1157 Please report other discrepancies, if you notice any, so that we can fix or
1158 document them.
1159
1160 Suggestions
1161 -----------
1162
1163 Suggestions to improve Open vSwitch are welcome at discuss@openvswitch.org.