]> git.proxmox.com Git - mirror_ovs.git/blob - ovn/ovn-architecture.7.xml
physical: Fix implementation of logical patch ports.
[mirror_ovs.git] / ovn / ovn-architecture.7.xml
1 <?xml version="1.0" encoding="utf-8"?>
2 <manpage program="ovn-architecture" section="7" title="OVN Architecture">
3 <h1>Name</h1>
4 <p>ovn-architecture -- Open Virtual Network architecture</p>
5
6 <h1>Description</h1>
7
8 <p>
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and L3
12 overlays and security groups. Services such as DHCP are also desirable
13 features. Just like OVS, OVN's design goal is to have a production-quality
14 implementation that can operate at significant scale.
15 </p>
16
17 <p>
18 An OVN deployment consists of several components:
19 </p>
20
21 <ul>
22 <li>
23 <p>
24 A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
25 OVN's ultimate client (via its users and administrators). OVN
26 integration requires installing a CMS-specific plugin and
27 related software (see below). OVN initially targets OpenStack
28 as CMS.
29 </p>
30
31 <p>
32 We generally speak of ``the'' CMS, but one can imagine scenarios in
33 which multiple CMSes manage different parts of an OVN deployment.
34 </p>
35 </li>
36
37 <li>
38 An OVN Database physical or virtual node (or, eventually, cluster)
39 installed in a central location.
40 </li>
41
42 <li>
43 One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
44 Open vSwitch and implement the interface described in
45 <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
46 platform supported by Open vSwitch is acceptable.
47 </li>
48
49 <li>
50 <p>
51 Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
52 logical network into a physical network by bidirectionally forwarding
53 packets between tunnels and a physical Ethernet port. This allows
54 non-virtualized machines to participate in logical networks. A gateway
55 may be a physical host, a virtual machine, or an ASIC-based hardware
56 switch that supports the <code>vtep</code>(5) schema. (Support for the
57 latter will come later in OVN implementation.)
58 </p>
59
60 <p>
61 Hypervisors and gateways are together called <dfn>transport node</dfn>
62 or <dfn>chassis</dfn>.
63 </p>
64 </li>
65 </ul>
66
67 <p>
68 The diagram below shows how the major components of OVN and related
69 software interact. Starting at the top of the diagram, we have:
70 </p>
71
72 <ul>
73 <li>
74 The Cloud Management System, as defined above.
75 </li>
76
77 <li>
78 <p>
79 The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
80 interfaces to OVN. In OpenStack, this is a Neutron plugin.
81 The plugin's main purpose is to translate the CMS's notion of logical
82 network configuration, stored in the CMS's configuration database in a
83 CMS-specific format, into an intermediate representation understood by
84 OVN.
85 </p>
86
87 <p>
88 This component is necessarily CMS-specific, so a new plugin needs to be
89 developed for each CMS that is integrated with OVN. All of the
90 components below this one in the diagram are CMS-independent.
91 </p>
92 </li>
93
94 <li>
95 <p>
96 The <dfn>OVN Northbound Database</dfn> receives the intermediate
97 representation of logical network configuration passed down by the
98 OVN/CMS Plugin. The database schema is meant to be ``impedance
99 matched'' with the concepts used in a CMS, so that it directly supports
100 notions of logical switches, routers, ACLs, and so on. See
101 <code>ovn-nb</code>(5) for details.
102 </p>
103
104 <p>
105 The OVN Northbound Database has only two clients: the OVN/CMS Plugin
106 above it and <code>ovn-northd</code> below it.
107 </p>
108 </li>
109
110 <li>
111 <code>ovn-northd</code>(8) connects to the OVN Northbound Database
112 above it and the OVN Southbound Database below it. It translates the
113 logical network configuration in terms of conventional network
114 concepts, taken from the OVN Northbound Database, into logical
115 datapath flows in the OVN Southbound Database below it.
116 </li>
117
118 <li>
119 <p>
120 The <dfn>OVN Southbound Database</dfn> is the center of the system.
121 Its clients are <code>ovn-northd</code>(8) above it and
122 <code>ovn-controller</code>(8) on every transport node below it.
123 </p>
124
125 <p>
126 The OVN Southbound Database contains three kinds of data: <dfn>Physical
127 Network</dfn> (PN) tables that specify how to reach hypervisor and
128 other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
129 logical network in terms of ``logical datapath flows,'' and
130 <dfn>Binding</dfn> tables that link logical network components'
131 locations to the physical network. The hypervisors populate the PN and
132 Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the
133 LN tables.
134 </p>
135
136 <p>
137 OVN Southbound Database performance must scale with the number of
138 transport nodes. This will likely require some work on
139 <code>ovsdb-server</code>(1) as we encounter bottlenecks.
140 Clustering for availability may be needed.
141 </p>
142 </li>
143 </ul>
144
145 <p>
146 The remaining components are replicated onto each hypervisor:
147 </p>
148
149 <ul>
150 <li>
151 <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
152 software gateway. Northbound, it connects to the OVN Southbound
153 Database to learn about OVN configuration and status and to
154 populate the PN table and the <code>Chassis</code> column in
155 <code>Binding</code> table with the hypervisor's status.
156 Southbound, it connects to <code>ovs-vswitchd</code>(8) as an
157 OpenFlow controller, for control over network traffic, and to the
158 local <code>ovsdb-server</code>(1) to allow it to monitor and
159 control Open vSwitch configuration.
160 </li>
161
162 <li>
163 <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
164 conventional components of Open vSwitch.
165 </li>
166 </ul>
167
168 <pre fixed="yes">
169 CMS
170 |
171 |
172 +-----------|-----------+
173 | | |
174 | OVN/CMS Plugin |
175 | | |
176 | | |
177 | OVN Northbound DB |
178 | | |
179 | | |
180 | ovn-northd |
181 | | |
182 +-----------|-----------+
183 |
184 |
185 +-------------------+
186 | OVN Southbound DB |
187 +-------------------+
188 |
189 |
190 +------------------+------------------+
191 | | |
192 HV 1 | | HV n |
193 +---------------|---------------+ . +---------------|---------------+
194 | | | . | | |
195 | ovn-controller | . | ovn-controller |
196 | | | | . | | | |
197 | | | | | | | |
198 | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
199 | | | |
200 +-------------------------------+ +-------------------------------+
201 </pre>
202
203 <h2>Chassis Setup</h2>
204
205 <p>
206 Each chassis in an OVN deployment must be configured with an Open vSwitch
207 bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>.
208 System startup scripts may create this bridge prior to starting
209 <code>ovn-controller</code> if desired. If this bridge does not exist when
210 ovn-controller starts, it will be created automatically with the default
211 configuration suggested below. The ports on the integration bridge include:
212 </p>
213
214 <ul>
215 <li>
216 On any chassis, tunnel ports that OVN uses to maintain logical network
217 connectivity. <code>ovn-controller</code> adds, updates, and removes
218 these tunnel ports.
219 </li>
220
221 <li>
222 On a hypervisor, any VIFs that are to be attached to logical networks.
223 The hypervisor itself, or the integration between Open vSwitch and the
224 hypervisor (described in <code>IntegrationGuide.md</code>) takes care of
225 this. (This is not part of OVN or new to OVN; this is pre-existing
226 integration work that has already been done on hypervisors that support
227 OVS.)
228 </li>
229
230 <li>
231 On a gateway, the physical port used for logical network connectivity.
232 System startup scripts add this port to the bridge prior to starting
233 <code>ovn-controller</code>. This can be a patch port to another bridge,
234 instead of a physical port, in more sophisticated setups.
235 </li>
236 </ul>
237
238 <p>
239 Other ports should not be attached to the integration bridge. In
240 particular, physical ports attached to the underlay network (as opposed to
241 gateway ports, which are physical ports attached to logical networks) must
242 not be attached to the integration bridge. Underlay physical ports should
243 instead be attached to a separate Open vSwitch bridge (they need not be
244 attached to any bridge at all, in fact).
245 </p>
246
247 <p>
248 The integration bridge should be configured as described below.
249 The effect of each of these settings is documented in
250 <code>ovs-vswitchd.conf.db</code>(5):
251 </p>
252
253 <!-- Keep the following in sync with create_br_int() in
254 ovn/controller/ovn-controller.c. -->
255 <dl>
256 <dt><code>fail-mode=secure</code></dt>
257 <dd>
258 Avoids switching packets between isolated logical networks before
259 <code>ovn-controller</code> starts up. See <code>Controller Failure
260 Settings</code> in <code>ovs-vsctl</code>(8) for more information.
261 </dd>
262
263 <dt><code>other-config:disable-in-band=true</code></dt>
264 <dd>
265 Suppresses in-band control flows for the integration bridge. It would be
266 unusual for such flows to show up anyway, because OVN uses a local
267 controller (over a Unix domain socket) instead of a remote controller.
268 It's possible, however, for some other bridge in the same system to have
269 an in-band remote controller, and in that case this suppresses the flows
270 that in-band control would ordinarily set up. See <code>In-Band
271 Control</code> in <code>DESIGN.md</code> for more information.
272 </dd>
273 </dl>
274
275 <p>
276 The customary name for the integration bridge is <code>br-int</code>, but
277 another name may be used.
278 </p>
279
280 <h2>Logical Networks</h2>
281
282 <p>
283 A <dfn>logical network</dfn> implements the same concepts as physical
284 networks, but they are insulated from the physical network with tunnels or
285 other encapsulations. This allows logical networks to have separate IP and
286 other address spaces that overlap, without conflicting, with those used for
287 physical networks. Logical network topologies can be arranged without
288 regard for the topologies of the physical networks on which they run.
289 </p>
290
291 <p>
292 Logical network concepts in OVN include:
293 </p>
294
295 <ul>
296 <li>
297 <dfn>Logical switches</dfn>, the logical version of Ethernet switches.
298 </li>
299
300 <li>
301 <dfn>Logical routers</dfn>, the logical version of IP routers. Logical
302 switches and routers can be connected into sophisticated topologies.
303 </li>
304
305 <li>
306 <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow
307 switch. Logical switches and routers are both implemented as logical
308 datapaths.
309 </li>
310 </ul>
311
312 <h2>Life Cycle of a VIF</h2>
313
314 <p>
315 Tables and their schemas presented in isolation are difficult to
316 understand. Here's an example.
317 </p>
318
319 <p>
320 A VIF on a hypervisor is a virtual network interface attached either
321 to a VM or a container running directly on that hypervisor (This is
322 different from the interface of a container running inside a VM).
323 </p>
324
325 <p>
326 The steps in this example refer often to details of the OVN and OVN
327 Northbound database schemas. Please see <code>ovn-sb</code>(5) and
328 <code>ovn-nb</code>(5), respectively, for the full story on these
329 databases.
330 </p>
331
332 <ol>
333 <li>
334 A VIF's life cycle begins when a CMS administrator creates a new VIF
335 using the CMS user interface or API and adds it to a switch (one
336 implemented by OVN as a logical switch). The CMS updates its own
337 configuration. This includes associating unique, persistent identifier
338 <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
339 </li>
340
341 <li>
342 The CMS plugin updates the OVN Northbound database to include the new
343 VIF, by adding a row to the <code>Logical_Port</code> table. In the new
344 row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is
345 <var>mac</var>, <code>switch</code> points to the OVN logical switch's
346 Logical_Switch record, and other columns are initialized appropriately.
347 </li>
348
349 <li>
350 <code>ovn-northd</code> receives the OVN Northbound database update. In
351 turn, it makes the corresponding updates to the OVN Southbound database,
352 by adding rows to the OVN Southbound database <code>Logical_Flow</code>
353 table to reflect the new port, e.g. add a flow to recognize that packets
354 destined to the new port's MAC address should be delivered to it, and
355 update the flow that delivers broadcast and multicast packets to include
356 the new port. It also creates a record in the <code>Binding</code> table
357 and populates all its columns except the column that identifies the
358 <code>chassis</code>.
359 </li>
360
361 <li>
362 On every hypervisor, <code>ovn-controller</code> receives the
363 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
364 in the previous step. As long as the VM that owns the VIF is powered
365 off, <code>ovn-controller</code> cannot do much; it cannot, for example,
366 arrange to send packets to or receive packets from the VIF, because the
367 VIF does not actually exist anywhere.
368 </li>
369
370 <li>
371 Eventually, a user powers on the VM that owns the VIF. On the hypervisor
372 where the VM is powered on, the integration between the hypervisor and
373 Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
374 to the OVN integration bridge and stores <var>vif-id</var> in
375 <code>external-ids</code>:<code>iface-id</code> to indicate that the
376 interface is an instantiation of the new VIF. (None of this code is new
377 in OVN; this is pre-existing integration work that has already been done
378 on hypervisors that support OVS.)
379 </li>
380
381 <li>
382 On the hypervisor where the VM is powered on, <code>ovn-controller</code>
383 notices <code>external-ids</code>:<code>iface-id</code> in the new
384 Interface. In response, it updates the local hypervisor's OpenFlow
385 tables so that packets to and from the VIF are properly handled.
386 Afterward, in the OVN Southbound DB, it updates the
387 <code>Binding</code> table's <code>chassis</code> column for the
388 row that links the logical port from
389 <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
390 </li>
391
392 <li>
393 Some CMS systems, including OpenStack, fully start a VM only when its
394 networking is ready. To support this, <code>ovn-northd</code> notices
395 the <code>chassis</code> column updated for the row in
396 <code>Binding</code> table and pushes this upward by updating the
397 <ref column="up" table="Logical_Port" db="OVN_NB"/> column in the OVN
398 Northbound database's <ref table="Logical_Port" db="OVN_NB"/> table to
399 indicate that the VIF is now up. The CMS, if it uses this feature, can
400 then
401 react by allowing the VM's execution to proceed.
402 </li>
403
404 <li>
405 On every hypervisor but the one where the VIF resides,
406 <code>ovn-controller</code> notices the completely populated row in the
407 <code>Binding</code> table. This provides <code>ovn-controller</code>
408 the physical location of the logical port, so each instance updates the
409 OpenFlow tables of its switch (based on logical datapath flows in the OVN
410 DB <code>Logical_Flow</code> table) so that packets to and from the VIF
411 can be properly handled via tunnels.
412 </li>
413
414 <li>
415 Eventually, a user powers off the VM that owns the VIF. On the
416 hypervisor where the VM was powered off, the VIF is deleted from the OVN
417 integration bridge.
418 </li>
419
420 <li>
421 On the hypervisor where the VM was powered off,
422 <code>ovn-controller</code> notices that the VIF was deleted. In
423 response, it removes the <code>Chassis</code> column content in the
424 <code>Binding</code> table for the logical port.
425 </li>
426
427 <li>
428 On every hypervisor, <code>ovn-controller</code> notices the empty
429 <code>Chassis</code> column in the <code>Binding</code> table's row
430 for the logical port. This means that <code>ovn-controller</code> no
431 longer knows the physical location of the logical port, so each instance
432 updates its OpenFlow table to reflect that.
433 </li>
434
435 <li>
436 Eventually, when the VIF (or its entire VM) is no longer needed by
437 anyone, an administrator deletes the VIF using the CMS user interface or
438 API. The CMS updates its own configuration.
439 </li>
440
441 <li>
442 The CMS plugin removes the VIF from the OVN Northbound database,
443 by deleting its row in the <code>Logical_Port</code> table.
444 </li>
445
446 <li>
447 <code>ovn-northd</code> receives the OVN Northbound update and in turn
448 updates the OVN Southbound database accordingly, by removing or updating
449 the rows from the OVN Southbound database <code>Logical_Flow</code> table
450 and <code>Binding</code> table that were related to the now-destroyed
451 VIF.
452 </li>
453
454 <li>
455 On every hypervisor, <code>ovn-controller</code> receives the
456 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
457 in the previous step. <code>ovn-controller</code> updates OpenFlow
458 tables to reflect the update, although there may not be much to do, since
459 the VIF had already become unreachable when it was removed from the
460 <code>Binding</code> table in a previous step.
461 </li>
462 </ol>
463
464 <h2>Life Cycle of a Container Interface Inside a VM</h2>
465
466 <p>
467 OVN provides virtual network abstractions by converting information
468 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
469 virtual networking for multi-tenants can only be provided if OVN controller
470 is the only entity that can modify flows in Open vSwitch. When the
471 Open vSwitch integration bridge resides in the hypervisor, it is a
472 fair assumption to make that tenant workloads running inside VMs cannot
473 make any changes to Open vSwitch flows.
474 </p>
475
476 <p>
477 If the infrastructure provider trusts the applications inside the
478 containers not to break out and modify the Open vSwitch flows, then
479 containers can be run in hypervisors. This is also the case when
480 containers are run inside the VMs and Open vSwitch integration bridge
481 with flows added by OVN controller resides in the same VM. For both
482 the above cases, the workflow is the same as explained with an example
483 in the previous section ("Life Cycle of a VIF").
484 </p>
485
486 <p>
487 This section talks about the life cycle of a container interface (CIF)
488 when containers are created in the VMs and the Open vSwitch integration
489 bridge resides inside the hypervisor. In this case, even if a container
490 application breaks out, other tenants are not affected because the
491 containers running inside the VMs cannot modify the flows in the
492 Open vSwitch integration bridge.
493 </p>
494
495 <p>
496 When multiple containers are created inside a VM, there are multiple
497 CIFs associated with them. The network traffic associated with these
498 CIFs need to reach the Open vSwitch integration bridge running in the
499 hypervisor for OVN to support virtual network abstractions. OVN should
500 also be able to distinguish network traffic coming from different CIFs.
501 There are two ways to distinguish network traffic of CIFs.
502 </p>
503
504 <p>
505 One way is to provide one VIF for every CIF (1:1 model). This means that
506 there could be a lot of network devices in the hypervisor. This would slow
507 down OVS because of all the additional CPU cycles needed for the management
508 of all the VIFs. It would also mean that the entity creating the
509 containers in a VM should also be able to create the corresponding VIFs in
510 the hypervisor.
511 </p>
512
513 <p>
514 The second way is to provide a single VIF for all the CIFs (1:many model).
515 OVN could then distinguish network traffic coming from different CIFs via
516 a tag written in every packet. OVN uses this mechanism and uses VLAN as
517 the tagging mechanism.
518 </p>
519
520 <ol>
521 <li>
522 A CIF's life cycle begins when a container is spawned inside a VM by
523 the either the same CMS that created the VM or a tenant that owns that VM
524 or even a container Orchestration System that is different than the CMS
525 that initially created the VM. Whoever the entity is, it will need to
526 know the <var>vif-id</var> that is associated with the network interface
527 of the VM through which the container interface's network traffic is
528 expected to go through. The entity that creates the container interface
529 will also need to choose an unused VLAN inside that VM.
530 </li>
531
532 <li>
533 The container spawning entity (either directly or through the CMS that
534 manages the underlying infrastructure) updates the OVN Northbound
535 database to include the new CIF, by adding a row to the
536 <code>Logical_Port</code> table. In the new row, <code>name</code> is
537 any unique identifier, <code>parent_name</code> is the <var>vif-id</var>
538 of the VM through which the CIF's network traffic is expected to go
539 through and the <code>tag</code> is the VLAN tag that identifies the
540 network traffic of that CIF.
541 </li>
542
543 <li>
544 <code>ovn-northd</code> receives the OVN Northbound database update. In
545 turn, it makes the corresponding updates to the OVN Southbound database,
546 by adding rows to the OVN Southbound database's <code>Logical_Flow</code>
547 table to reflect the new port and also by creating a new row in the
548 <code>Binding</code> table and populating all its columns except the
549 column that identifies the <code>chassis</code>.
550 </li>
551
552 <li>
553 On every hypervisor, <code>ovn-controller</code> subscribes to the
554 changes in the <code>Binding</code> table. When a new row is created
555 by <code>ovn-northd</code> that includes a value in
556 <code>parent_port</code> column of <code>Binding</code> table, the
557 <code>ovn-controller</code> in the hypervisor whose OVN integration bridge
558 has that same value in <var>vif-id</var> in
559 <code>external-ids</code>:<code>iface-id</code>
560 updates the local hypervisor's OpenFlow tables so that packets to and
561 from the VIF with the particular VLAN <code>tag</code> are properly
562 handled. Afterward it updates the <code>chassis</code> column of
563 the <code>Binding</code> to reflect the physical location.
564 </li>
565
566 <li>
567 One can only start the application inside the container after the
568 underlying network is ready. To support this, <code>ovn-northd</code>
569 notices the updated <code>chassis</code> column in <code>Binding</code>
570 table and updates the <ref column="up" table="Logical_Port"
571 db="OVN_NB"/> column in the OVN Northbound database's
572 <ref table="Logical_Port" db="OVN_NB"/> table to indicate that the
573 CIF is now up. The entity responsible to start the container application
574 queries this value and starts the application.
575 </li>
576
577 <li>
578 Eventually the entity that created and started the container, stops it.
579 The entity, through the CMS (or directly) deletes its row in the
580 <code>Logical_Port</code> table.
581 </li>
582
583 <li>
584 <code>ovn-northd</code> receives the OVN Northbound update and in turn
585 updates the OVN Southbound database accordingly, by removing or updating
586 the rows from the OVN Southbound database <code>Logical_Flow</code> table
587 that were related to the now-destroyed CIF. It also deletes the row in
588 the <code>Binding</code> table for that CIF.
589 </li>
590
591 <li>
592 On every hypervisor, <code>ovn-controller</code> receives the
593 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
594 in the previous step. <code>ovn-controller</code> updates OpenFlow
595 tables to reflect the update.
596 </li>
597 </ol>
598
599 <h2>Architectural Physical Life Cycle of a Packet</h2>
600
601 <p>
602 This section describes how a packet travels from one virtual machine or
603 container to another through OVN. This description focuses on the physical
604 treatment of a packet; for a description of the logical life cycle of a
605 packet, please refer to the <code>Logical_Flow</code> table in
606 <code>ovn-sb</code>(5).
607 </p>
608
609 <p>
610 This section mentions several data and metadata fields, for clarity
611 summarized here:
612 </p>
613
614 <dl>
615 <dt>tunnel key</dt>
616 <dd>
617 When OVN encapsulates a packet in Geneve or another tunnel, it attaches
618 extra data to it to allow the receiving OVN instance to process it
619 correctly. This takes different forms depending on the particular
620 encapsulation, but in each case we refer to it here as the ``tunnel
621 key.'' See <code>Tunnel Encapsulations</code>, below, for details.
622 </dd>
623
624 <dt>logical datapath field</dt>
625 <dd>
626 A field that denotes the logical datapath through which a packet is being
627 processed.
628 <!-- Keep the following in sync with MFF_LOG_DATAPATH in
629 ovn/lib/logical-fields.h. -->
630 OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls
631 ``metadata'' to store the logical datapath. (This field is passed across
632 tunnels as part of the tunnel key.)
633 </dd>
634
635 <dt>logical input port field</dt>
636 <dd>
637 <p>
638 A field that denotes the logical port from which the packet
639 entered the logical datapath.
640 <!-- Keep the following in sync with MFF_LOG_INPORT in
641 ovn/lib/logical-fields.h. -->
642 OVN stores this in Nicira extension register number 6.
643 </p>
644
645 <p>
646 Geneve and STT tunnels pass this field as part of the tunnel key.
647 Although VXLAN tunnels do not explicitly carry a logical input port,
648 OVN only uses VXLAN to communicate with gateways that from OVN's
649 perspective consist of only a single logical port, so that OVN can set
650 the logical input port field to this one on ingress to the OVN logical
651 pipeline.
652 </p>
653 </dd>
654
655 <dt>logical output port field</dt>
656 <dd>
657 <p>
658 A field that denotes the logical port from which the packet will
659 leave the logical datapath. This is initialized to 0 at the
660 beginning of the logical ingress pipeline.
661 <!-- Keep the following in sync with MFF_LOG_OUTPORT in
662 ovn/lib/logical-fields.h. -->
663 OVN stores this in Nicira extension register number 7.
664 </p>
665
666 <p>
667 Geneve and STT tunnels pass this field as part of the tunnel key.
668 VXLAN tunnels do not transmit the logical output port field.
669 </p>
670 </dd>
671
672 <dt>conntrack zone field</dt>
673 <dd>
674 A field that denotes the connection tracking zone. The value only
675 has local significance and is not meaningful between chassis.
676 This is initialized to 0 at the beginning of the logical ingress
677 pipeline. OVN stores this in Nicira extension register number 5.
678 </dd>
679
680 <dt>VLAN ID</dt>
681 <dd>
682 The VLAN ID is used as an interface between OVN and containers nested
683 inside a VM (see <code>Life Cycle of a container interface inside a
684 VM</code>, above, for more information).
685 </dd>
686 </dl>
687
688 <p>
689 Initially, a VM or container on the ingress hypervisor sends a packet on a
690 port attached to the OVN integration bridge. Then:
691 </p>
692
693 <ol>
694 <li>
695 <p>
696 OpenFlow table 0 performs physical-to-logical translation. It matches
697 the packet's ingress port. Its actions annotate the packet with
698 logical metadata, by setting the logical datapath field to identify the
699 logical datapath that the packet is traversing and the logical input
700 port field to identify the ingress port. Then it resubmits to table 16
701 to enter the logical ingress pipeline.
702 </p>
703
704 <p>
705 It's possible that a single ingress physical port maps to multiple
706 logical ports with a type of <code>localnet</code>. The logical datapath
707 and logical input port fields will be reset and the packet will be
708 resubmitted to table 16 multiple times.
709 </p>
710
711 <p>
712 Packets that originate from a container nested within a VM are treated
713 in a slightly different way. The originating container can be
714 distinguished based on the VIF-specific VLAN ID, so the
715 physical-to-logical translation flows additionally match on VLAN ID and
716 the actions strip the VLAN header. Following this step, OVN treats
717 packets from containers just like any other packets.
718 </p>
719
720 <p>
721 Table 0 also processes packets that arrive from other chassis. It
722 distinguishes them from other packets by ingress port, which is a
723 tunnel. As with packets just entering the OVN pipeline, the actions
724 annotate these packets with logical datapath and logical ingress port
725 metadata. In addition, the actions set the logical output port field,
726 which is available because in OVN tunneling occurs after the logical
727 output port is known. These three pieces of information are obtained
728 from the tunnel encapsulation metadata (see <code>Tunnel
729 Encapsulations</code> for encoding details). Then the actions resubmit
730 to table 33 to enter the logical egress pipeline.
731 </p>
732 </li>
733
734 <li>
735 <p>
736 OpenFlow tables 16 through 31 execute the logical ingress pipeline from
737 the <code>Logical_Flow</code> table in the OVN Southbound database.
738 These tables are expressed entirely in terms of logical concepts like
739 logical ports and logical datapaths. A big part of
740 <code>ovn-controller</code>'s job is to translate them into equivalent
741 OpenFlow (in particular it translates the table numbers:
742 <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16
743 through 31). For a given packet, the logical ingress pipeline
744 eventually executes zero or more <code>output</code> actions:
745 </p>
746
747 <ul>
748 <li>
749 If the pipeline executes no <code>output</code> actions at all, the
750 packet is effectively dropped.
751 </li>
752
753 <li>
754 Most commonly, the pipeline executes one <code>output</code> action,
755 which <code>ovn-controller</code> implements by resubmitting the
756 packet to table 32.
757 </li>
758
759 <li>
760 If the pipeline can execute more than one <code>output</code> action,
761 then each one is separately resubmitted to table 32. This can be
762 used to send multiple copies of the packet to multiple ports. (If
763 the packet was not modified between the <code>output</code> actions,
764 and some of the copies are destined to the same hypervisor, then
765 using a logical multicast output port would save bandwidth between
766 hypervisors.)
767 </li>
768 </ul>
769 </li>
770
771 <li>
772 <p>
773 OpenFlow tables 32 through 47 implement the <code>output</code> action
774 in the logical ingress pipeline. Specifically, table 32 handles
775 packets to remote hypervisors, table 33 handles packets to the local
776 hypervisor, and table 34 discards packets whose logical ingress and
777 egress port are the same.
778 </p>
779
780 <p>
781 Logical patch ports are a special case. Logical patch ports do not
782 have a physical location and effectively reside on every hypervisor.
783 Thus, flow table 33, for output to ports on the local hypervisor,
784 naturally implements output to unicast logical patch ports too.
785 However, applying the same logic to a logical patch port that is part
786 of a logical multicast group yields packet duplication, because each
787 hypervisor that contains a logical port in the multicast group will
788 also output the packet to the logical patch port. Thus, multicast
789 groups implement output to logical patch ports in table 32.
790 </p>
791
792 <p>
793 Each flow in table 32 matches on a logical output port for unicast or
794 multicast logical ports that include a logical port on a remote
795 hypervisor. Each flow's actions implement sending a packet to the port
796 it matches. For unicast logical output ports on remote hypervisors,
797 the actions set the tunnel key to the correct value, then send the
798 packet on the tunnel port to the correct hypervisor. (When the remote
799 hypervisor receives the packet, table 0 there will recognize it as a
800 tunneled packet and pass it along to table 33.) For multicast logical
801 output ports, the actions send one copy of the packet to each remote
802 hypervisor, in the same way as for unicast destinations. If a
803 multicast group includes a logical port or ports on the local
804 hypervisor, then its actions also resubmit to table 33. Table 32 also
805 includes a fallback flow that resubmits to table 33 if there is no
806 other match.
807 </p>
808
809 <p>
810 Flows in table 33 resemble those in table 32 but for logical ports that
811 reside locally rather than remotely. For unicast logical output ports
812 on the local hypervisor, the actions just resubmit to table 34. For
813 multicast output ports that include one or more logical ports on the
814 local hypervisor, for each such logical port <var>P</var>, the actions
815 change the logical output port to <var>P</var>, then resubmit to table
816 34.
817 </p>
818
819 <p>
820 Table 34 matches and drops packets for which the logical input and
821 output ports are the same. It resubmits other packets to table 48.
822 </p>
823 </li>
824
825 <li>
826 <p>
827 OpenFlow tables 48 through 63 execute the logical egress pipeline from
828 the <code>Logical_Flow</code> table in the OVN Southbound database.
829 The egress pipeline can perform a final stage of validation before
830 packet delivery. Eventually, it may execute an <code>output</code>
831 action, which <code>ovn-controller</code> implements by resubmitting to
832 table 64. A packet for which the pipeline never executes
833 <code>output</code> is effectively dropped (although it may have been
834 transmitted through a tunnel across a physical network).
835 </p>
836
837 <p>
838 The egress pipeline cannot change the logical output port or cause
839 further tunneling.
840 </p>
841 </li>
842
843 <li>
844 <p>
845 OpenFlow table 64 performs logical-to-physical translation, the
846 opposite of table 0. It matches the packet's logical egress port. Its
847 actions output the packet to the port attached to the OVN integration
848 bridge that represents that logical port. If the logical egress port
849 is a container nested with a VM, then before sending the packet the
850 actions push on a VLAN header with an appropriate VLAN ID.
851 </p>
852
853 <p>
854 If the logical egress port is a logical patch port, then table 64
855 outputs to an OVS patch port that represents the logical patch port.
856 The packet re-enters the OpenFlow flow table from the OVS patch port's
857 peer in table 0, which identifies the logical datapath and logical
858 input port based on the OVS patch port's OpenFlow port number.
859 </p>
860 </li>
861 </ol>
862
863 <h2>Life Cycle of a VTEP gateway</h2>
864
865 <p>
866 A gateway is a chassis that forwards traffic between the OVN-managed
867 part of a logical network and a physical VLAN, extending a
868 tunnel-based logical network into a physical network.
869 </p>
870
871 <p>
872 The steps below refer often to details of the OVN and VTEP database
873 schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5)
874 and <code>vtep</code>(5), respectively, for the full story on these
875 databases.
876 </p>
877
878 <ol>
879 <li>
880 A VTEP gateway's life cycle begins with the administrator registering
881 the VTEP gateway as a <code>Physical_Switch</code> table entry in the
882 <code>VTEP</code> database. The <code>ovn-controller-vtep</code>
883 connected to this VTEP database, will recognize the new VTEP gateway
884 and create a new <code>Chassis</code> table entry for it in the
885 <code>OVN_Southbound</code> database.
886 </li>
887
888 <li>
889 The administrator can then create a new <code>Logical_Switch</code>
890 table entry, and bind a particular vlan on a VTEP gateway's port to
891 any VTEP logical switch. Once a VTEP logical switch is bound to
892 a VTEP gateway, the <code>ovn-controller-vtep</code> will detect
893 it and add its name to the <var>vtep_logical_switches</var>
894 column of the <code>Chassis</code> table in the <code>
895 OVN_Southbound</code> database. Note, the <var>tunnel_key</var>
896 column of VTEP logical switch is not filled at creation. The
897 <code>ovn-controller-vtep</code> will set the column when the
898 correponding vtep logical switch is bound to an OVN logical network.
899 </li>
900
901 <li>
902 Now, the administrator can use the CMS to add a VTEP logical switch
903 to the OVN logical network. To do that, the CMS must first create a
904 new <code>Logical_Port</code> table entry in the <code>
905 OVN_Northbound</code> database. Then, the <var>type</var> column
906 of this entry must be set to "vtep". Next, the <var>
907 vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys
908 in the <var>options</var> column must also be specified, since
909 multiple VTEP gateways can attach to the same VTEP logical switch.
910 </li>
911
912 <li>
913 The newly created logical port in the <code>OVN_Northbound</code>
914 database and its configuration will be passed down to the <code>
915 OVN_Southbound</code> database as a new <code>Port_Binding</code>
916 table entry. The <code>ovn-controller-vtep</code> will recognize the
917 change and bind the logical port to the corresponding VTEP gateway
918 chassis. Configuration of binding the same VTEP logical switch to
919 a different OVN logical networks is not allowed and a warning will be
920 generated in the log.
921 </li>
922
923 <li>
924 Beside binding to the VTEP gateway chassis, the <code>
925 ovn-controller-vtep</code> will update the <var>tunnel_key</var>
926 column of the VTEP logical switch to the corresponding <code>
927 Datapath_Binding</code> table entry's <var>tunnel_key</var> for the
928 bound OVN logical network.
929 </li>
930
931 <li>
932 Next, the <code>ovn-controller-vtep</code> will keep reacting to the
933 configuration change in the <code>Port_Binding</code> in the
934 <code>OVN_Northbound</code> database, and updating the
935 <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database.
936 This allows the VTEP gateway to understand where to forward the unicast
937 traffic coming from the extended external network.
938 </li>
939
940 <li>
941 Eventually, the VTEP gateway's life cycle ends when the administrator
942 unregisters the VTEP gateway from the <code>VTEP</code> database.
943 The <code>ovn-controller-vtep</code> will recognize the event and
944 remove all related configurations (<code>Chassis</code> table entry
945 and port bindings) in the <code>OVN_Southbound</code> database.
946 </li>
947
948 <li>
949 When the <code>ovn-controller-vtep</code> is terminated, all related
950 configurations in the <code>OVN_Southbound</code> database and
951 the <code>VTEP</code> database will be cleaned, including
952 <code>Chassis</code> table entries for all registered VTEP gateways
953 and their port bindings, and all <code>Ucast_Macs_Remote</code> table
954 entries and the <code>Logical_Switch</code> tunnel keys.
955 </li>
956 </ol>
957
958 <h1>Design Decisions</h1>
959
960 <h2>Tunnel Encapsulations</h2>
961
962 <p>
963 OVN annotates logical network packets that it sends from one hypervisor to
964 another with the following three pieces of metadata, which are encoded in
965 an encapsulation-specific fashion:
966 </p>
967
968 <ul>
969 <li>
970 24-bit logical datapath identifier, from the <code>tunnel_key</code>
971 column in the OVN Southbound <code>Datapath_Binding</code> table.
972 </li>
973
974 <li>
975 15-bit logical ingress port identifier. ID 0 is reserved for internal
976 use within OVN. IDs 1 through 32767, inclusive, may be assigned to
977 logical ports (see the <code>tunnel_key</code> column in the OVN
978 Southbound <code>Port_Binding</code> table).
979 </li>
980
981 <li>
982 16-bit logical egress port identifier. IDs 0 through 32767 have the same
983 meaning as for logical ingress ports. IDs 32768 through 65535,
984 inclusive, may be assigned to logical multicast groups (see the
985 <code>tunnel_key</code> column in the OVN Southbound
986 <code>Multicast_Group</code> table).
987 </li>
988 </ul>
989
990 <p>
991 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
992 encapsulations, for the following reasons:
993 </p>
994
995 <ul>
996 <li>
997 Only STT and Geneve support the large amounts of metadata (over 32 bits
998 per packet) that OVN uses (as described above).
999 </li>
1000
1001 <li>
1002 STT and Geneve use randomized UDP or TCP source ports that allows
1003 efficient distribution among multiple paths in environments that use ECMP
1004 in their underlay.
1005 </li>
1006
1007 <li>
1008 NICs are available to offload STT and Geneve encapsulation and
1009 decapsulation.
1010 </li>
1011 </ul>
1012
1013 <p>
1014 Due to its flexibility, the preferred encapsulation between hypervisors is
1015 Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1016 identifier in the Geneve VNI.
1017
1018 <!-- Keep the following in sync with ovn/controller/physical.h. -->
1019 OVN transmits the logical ingress and logical egress ports in a TLV with
1020 class 0xffff, type 0, and a 32-bit value encoded as follows, from MSB to
1021 LSB:
1022 </p>
1023
1024 <diagram>
1025 <header name="">
1026 <bits name="rsv" above="1" below="0" width=".25"/>
1027 <bits name="ingress port" above="15" width=".75"/>
1028 <bits name="egress port" above="16" width=".75"/>
1029 </header>
1030 </diagram>
1031
1032 <p>
1033 Environments whose NICs lack Geneve offload may prefer STT encapsulation
1034 for performance reasons. For STT encapsulation, OVN encodes all three
1035 pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
1036 to LSB:
1037 </p>
1038
1039 <diagram>
1040 <header name="">
1041 <bits name="reserved" above="9" below="0" width=".5"/>
1042 <bits name="ingress port" above="15" width=".75"/>
1043 <bits name="egress port" above="16" width=".75"/>
1044 <bits name="datapath" above="24" width="1.25"/>
1045 </header>
1046 </diagram>
1047
1048 <p>
1049 For connecting to gateways, in addition to Geneve and STT, OVN supports
1050 VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches.
1051 Currently, gateways have a feature set that matches the capabilities as
1052 defined by the VTEP schema, so fewer bits of metadata are necessary. In
1053 the future, gateways that do not support encapsulations with large amounts
1054 of metadata may continue to have a reduced feature set.
1055 </p>
1056 </manpage>