]> git.proxmox.com Git - mirror_ovs.git/blob - ovn/ovn-architecture.7.xml
ovn: make external_ids naming uniform
[mirror_ovs.git] / ovn / ovn-architecture.7.xml
1 <?xml version="1.0" encoding="utf-8"?>
2 <manpage program="ovn-architecture" section="7" title="OVN Architecture">
3 <h1>Name</h1>
4 <p>ovn-architecture -- Open Virtual Network architecture</p>
5
6 <h1>Description</h1>
7
8 <p>
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and L3
12 overlays and security groups. Services such as DHCP are also desirable
13 features. Just like OVS, OVN's design goal is to have a production-quality
14 implementation that can operate at significant scale.
15 </p>
16
17 <p>
18 An OVN deployment consists of several components:
19 </p>
20
21 <ul>
22 <li>
23 <p>
24 A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
25 OVN's ultimate client (via its users and administrators). OVN
26 integration requires installing a CMS-specific plugin and
27 related software (see below). OVN initially targets OpenStack
28 as CMS.
29 </p>
30
31 <p>
32 We generally speak of ``the'' CMS, but one can imagine scenarios in
33 which multiple CMSes manage different parts of an OVN deployment.
34 </p>
35 </li>
36
37 <li>
38 An OVN Database physical or virtual node (or, eventually, cluster)
39 installed in a central location.
40 </li>
41
42 <li>
43 One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
44 Open vSwitch and implement the interface described in
45 <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
46 platform supported by Open vSwitch is acceptable.
47 </li>
48
49 <li>
50 <p>
51 Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
52 logical network into a physical network by bidirectionally forwarding
53 packets between tunnels and a physical Ethernet port. This allows
54 non-virtualized machines to participate in logical networks. A gateway
55 may be a physical host, a virtual machine, or an ASIC-based hardware
56 switch that supports the <code>vtep</code>(5) schema. (Support for the
57 latter will come later in OVN implementation.)
58 </p>
59
60 <p>
61 Hypervisors and gateways are together called <dfn>transport node</dfn>
62 or <dfn>chassis</dfn>.
63 </p>
64 </li>
65 </ul>
66
67 <p>
68 The diagram below shows how the major components of OVN and related
69 software interact. Starting at the top of the diagram, we have:
70 </p>
71
72 <ul>
73 <li>
74 The Cloud Management System, as defined above.
75 </li>
76
77 <li>
78 <p>
79 The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
80 interfaces to OVN. In OpenStack, this is a Neutron plugin.
81 The plugin's main purpose is to translate the CMS's notion of logical
82 network configuration, stored in the CMS's configuration database in a
83 CMS-specific format, into an intermediate representation understood by
84 OVN.
85 </p>
86
87 <p>
88 This component is necessarily CMS-specific, so a new plugin needs to be
89 developed for each CMS that is integrated with OVN. All of the
90 components below this one in the diagram are CMS-independent.
91 </p>
92 </li>
93
94 <li>
95 <p>
96 The <dfn>OVN Northbound Database</dfn> receives the intermediate
97 representation of logical network configuration passed down by the
98 OVN/CMS Plugin. The database schema is meant to be ``impedance
99 matched'' with the concepts used in a CMS, so that it directly supports
100 notions of logical switches, routers, ACLs, and so on. See
101 <code>ovn-nb</code>(5) for details.
102 </p>
103
104 <p>
105 The OVN Northbound Database has only two clients: the OVN/CMS Plugin
106 above it and <code>ovn-northd</code> below it.
107 </p>
108 </li>
109
110 <li>
111 <code>ovn-northd</code>(8) connects to the OVN Northbound Database
112 above it and the OVN Southbound Database below it. It translates the
113 logical network configuration in terms of conventional network
114 concepts, taken from the OVN Northbound Database, into logical
115 datapath flows in the OVN Southbound Database below it.
116 </li>
117
118 <li>
119 <p>
120 The <dfn>OVN Southbound Database</dfn> is the center of the system.
121 Its clients are <code>ovn-northd</code>(8) above it and
122 <code>ovn-controller</code>(8) on every transport node below it.
123 </p>
124
125 <p>
126 The OVN Southbound Database contains three kinds of data: <dfn>Physical
127 Network</dfn> (PN) tables that specify how to reach hypervisor and
128 other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
129 logical network in terms of ``logical datapath flows,'' and
130 <dfn>Binding</dfn> tables that link logical network components'
131 locations to the physical network. The hypervisors populate the PN and
132 Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the
133 LN tables.
134 </p>
135
136 <p>
137 OVN Southbound Database performance must scale with the number of
138 transport nodes. This will likely require some work on
139 <code>ovsdb-server</code>(1) as we encounter bottlenecks.
140 Clustering for availability may be needed.
141 </p>
142 </li>
143 </ul>
144
145 <p>
146 The remaining components are replicated onto each hypervisor:
147 </p>
148
149 <ul>
150 <li>
151 <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
152 software gateway. Northbound, it connects to the OVN Southbound
153 Database to learn about OVN configuration and status and to
154 populate the PN table and the <code>Chassis</code> column in
155 <code>Binding</code> table with the hypervisor's status.
156 Southbound, it connects to <code>ovs-vswitchd</code>(8) as an
157 OpenFlow controller, for control over network traffic, and to the
158 local <code>ovsdb-server</code>(1) to allow it to monitor and
159 control Open vSwitch configuration.
160 </li>
161
162 <li>
163 <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
164 conventional components of Open vSwitch.
165 </li>
166 </ul>
167
168 <pre fixed="yes">
169 CMS
170 |
171 |
172 +-----------|-----------+
173 | | |
174 | OVN/CMS Plugin |
175 | | |
176 | | |
177 | OVN Northbound DB |
178 | | |
179 | | |
180 | ovn-northd |
181 | | |
182 +-----------|-----------+
183 |
184 |
185 +-------------------+
186 | OVN Southbound DB |
187 +-------------------+
188 |
189 |
190 +------------------+------------------+
191 | | |
192 HV 1 | | HV n |
193 +---------------|---------------+ . +---------------|---------------+
194 | | | . | | |
195 | ovn-controller | . | ovn-controller |
196 | | | | . | | | |
197 | | | | | | | |
198 | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
199 | | | |
200 +-------------------------------+ +-------------------------------+
201 </pre>
202
203 <h2>Information Flow in OVN</h2>
204
205 <p>
206 Configuration data in OVN flows from north to south. The CMS, through its
207 OVN/CMS plugin, passes the logical network configuration to
208 <code>ovn-northd</code> via the northbound database. In turn,
209 <code>ovn-northd</code> compiles the configuration into a lower-level form
210 and passes it to all of the chassis via the southbound database.
211 </p>
212
213 <p>
214 Status information in OVN flows from south to north. OVN currently
215 provides only a few forms of status information. First,
216 <code>ovn-northd</code> populates the <code>up</code> column in the
217 northbound <code>Logical_Switch_Port</code> table: if a logical port's
218 <code>chassis</code> column in the southbound <code>Port_Binding</code>
219 table is nonempty, it sets <code>up</code> to <code>true</code>, otherwise
220 to <code>false</code>. This allows the CMS to detect when a VM's
221 networking has come up.
222 </p>
223
224 <p>
225 Second, OVN provides feedback to the CMS on the realization of its
226 configuration, that is, whether the configuration provided by the CMS has
227 taken effect. This feature requires the CMS to participate in a sequence
228 number protocol, which works the following way:
229 </p>
230
231 <ol>
232 <li>
233 When the CMS updates the configuration in the northbound database, as
234 part of the same transaction, it increments the value of the
235 <code>nb_cfg</code> column in the <code>NB_Global</code> table. (This is
236 only necessary if the CMS wants to know when the configuration has been
237 realized.)
238 </li>
239
240 <li>
241 When <code>ovn-northd</code> updates the southbound database based on a
242 given snapshot of the northbound database, it copies <code>nb_cfg</code>
243 from northbound <code>NB_Global</code> into the southbound database
244 <code>SB_Global</code> table, as part of the same transaction. (Thus, an
245 observer monitoring both databases can determine when the southbound
246 database is caught up with the northbound.)
247 </li>
248
249 <li>
250 After <code>ovn-northd</code> receives confirmation from the southbound
251 database server that its changes have committed, it updates
252 <code>sb_cfg</code> in the northbound <code>NB_Global</code> table to the
253 <code>nb_cfg</code> version that was pushed down. (Thus, the CMS or
254 another observer can determine when the southbound database is caught up
255 without a connection to the southbound database.)
256 </li>
257
258 <li>
259 The <code>ovn-controller</code> process on each chassis receives the
260 updated southbound database, with the updated <code>nb_cfg</code>. This
261 process in turn updates the physical flows installed in the chassis's
262 Open vSwitch instances. When it receives confirmation from Open vSwitch
263 that the physical flows have been updated, it updates <code>nb_cfg</code>
264 in its own <code>Chassis</code> record in the southbound database.
265 </li>
266
267 <li>
268 <code>ovn-northd</code> monitors the <code>nb_cfg</code> column in all of
269 the <code>Chassis</code> records in the southbound database. It keeps
270 track of the minimum value among all the records and copies it into the
271 <code>hv_cfg</code> column in the northbound <code>NB_Global</code>
272 table. (Thus, the CMS or another observer can determine when all of the
273 hypervisors have caught up to the northbound configuration.)
274 </li>
275 </ol>
276
277 <h2>Chassis Setup</h2>
278
279 <p>
280 Each chassis in an OVN deployment must be configured with an Open vSwitch
281 bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>.
282 System startup scripts may create this bridge prior to starting
283 <code>ovn-controller</code> if desired. If this bridge does not exist when
284 ovn-controller starts, it will be created automatically with the default
285 configuration suggested below. The ports on the integration bridge include:
286 </p>
287
288 <ul>
289 <li>
290 On any chassis, tunnel ports that OVN uses to maintain logical network
291 connectivity. <code>ovn-controller</code> adds, updates, and removes
292 these tunnel ports.
293 </li>
294
295 <li>
296 On a hypervisor, any VIFs that are to be attached to logical networks.
297 The hypervisor itself, or the integration between Open vSwitch and the
298 hypervisor (described in <code>IntegrationGuide.md</code>) takes care of
299 this. (This is not part of OVN or new to OVN; this is pre-existing
300 integration work that has already been done on hypervisors that support
301 OVS.)
302 </li>
303
304 <li>
305 On a gateway, the physical port used for logical network connectivity.
306 System startup scripts add this port to the bridge prior to starting
307 <code>ovn-controller</code>. This can be a patch port to another bridge,
308 instead of a physical port, in more sophisticated setups.
309 </li>
310 </ul>
311
312 <p>
313 Other ports should not be attached to the integration bridge. In
314 particular, physical ports attached to the underlay network (as opposed to
315 gateway ports, which are physical ports attached to logical networks) must
316 not be attached to the integration bridge. Underlay physical ports should
317 instead be attached to a separate Open vSwitch bridge (they need not be
318 attached to any bridge at all, in fact).
319 </p>
320
321 <p>
322 The integration bridge should be configured as described below.
323 The effect of each of these settings is documented in
324 <code>ovs-vswitchd.conf.db</code>(5):
325 </p>
326
327 <!-- Keep the following in sync with create_br_int() in
328 ovn/controller/ovn-controller.c. -->
329 <dl>
330 <dt><code>fail-mode=secure</code></dt>
331 <dd>
332 Avoids switching packets between isolated logical networks before
333 <code>ovn-controller</code> starts up. See <code>Controller Failure
334 Settings</code> in <code>ovs-vsctl</code>(8) for more information.
335 </dd>
336
337 <dt><code>other-config:disable-in-band=true</code></dt>
338 <dd>
339 Suppresses in-band control flows for the integration bridge. It would be
340 unusual for such flows to show up anyway, because OVN uses a local
341 controller (over a Unix domain socket) instead of a remote controller.
342 It's possible, however, for some other bridge in the same system to have
343 an in-band remote controller, and in that case this suppresses the flows
344 that in-band control would ordinarily set up. See <code>In-Band
345 Control</code> in <code>DESIGN.md</code> for more information.
346 </dd>
347 </dl>
348
349 <p>
350 The customary name for the integration bridge is <code>br-int</code>, but
351 another name may be used.
352 </p>
353
354 <h2>Logical Networks</h2>
355
356 <p>
357 A <dfn>logical network</dfn> implements the same concepts as physical
358 networks, but they are insulated from the physical network with tunnels or
359 other encapsulations. This allows logical networks to have separate IP and
360 other address spaces that overlap, without conflicting, with those used for
361 physical networks. Logical network topologies can be arranged without
362 regard for the topologies of the physical networks on which they run.
363 </p>
364
365 <p>
366 Logical network concepts in OVN include:
367 </p>
368
369 <ul>
370 <li>
371 <dfn>Logical switches</dfn>, the logical version of Ethernet switches.
372 </li>
373
374 <li>
375 <dfn>Logical routers</dfn>, the logical version of IP routers. Logical
376 switches and routers can be connected into sophisticated topologies.
377 </li>
378
379 <li>
380 <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow
381 switch. Logical switches and routers are both implemented as logical
382 datapaths.
383 </li>
384 </ul>
385
386 <h2>Life Cycle of a VIF</h2>
387
388 <p>
389 Tables and their schemas presented in isolation are difficult to
390 understand. Here's an example.
391 </p>
392
393 <p>
394 A VIF on a hypervisor is a virtual network interface attached either
395 to a VM or a container running directly on that hypervisor (This is
396 different from the interface of a container running inside a VM).
397 </p>
398
399 <p>
400 The steps in this example refer often to details of the OVN and OVN
401 Northbound database schemas. Please see <code>ovn-sb</code>(5) and
402 <code>ovn-nb</code>(5), respectively, for the full story on these
403 databases.
404 </p>
405
406 <ol>
407 <li>
408 A VIF's life cycle begins when a CMS administrator creates a new VIF
409 using the CMS user interface or API and adds it to a switch (one
410 implemented by OVN as a logical switch). The CMS updates its own
411 configuration. This includes associating unique, persistent identifier
412 <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
413 </li>
414
415 <li>
416 The CMS plugin updates the OVN Northbound database to include the new
417 VIF, by adding a row to the <code>Logical_Switch_Port</code>
418 table. In the new row, <code>name</code> is <var>vif-id</var>,
419 <code>mac</code> is <var>mac</var>, <code>switch</code> points to
420 the OVN logical switch's Logical_Switch record, and other columns
421 are initialized appropriately.
422 </li>
423
424 <li>
425 <code>ovn-northd</code> receives the OVN Northbound database update. In
426 turn, it makes the corresponding updates to the OVN Southbound database,
427 by adding rows to the OVN Southbound database <code>Logical_Flow</code>
428 table to reflect the new port, e.g. add a flow to recognize that packets
429 destined to the new port's MAC address should be delivered to it, and
430 update the flow that delivers broadcast and multicast packets to include
431 the new port. It also creates a record in the <code>Binding</code> table
432 and populates all its columns except the column that identifies the
433 <code>chassis</code>.
434 </li>
435
436 <li>
437 On every hypervisor, <code>ovn-controller</code> receives the
438 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
439 in the previous step. As long as the VM that owns the VIF is powered
440 off, <code>ovn-controller</code> cannot do much; it cannot, for example,
441 arrange to send packets to or receive packets from the VIF, because the
442 VIF does not actually exist anywhere.
443 </li>
444
445 <li>
446 Eventually, a user powers on the VM that owns the VIF. On the hypervisor
447 where the VM is powered on, the integration between the hypervisor and
448 Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
449 to the OVN integration bridge and stores <var>vif-id</var> in
450 <code>external_ids</code>:<code>iface-id</code> to indicate that the
451 interface is an instantiation of the new VIF. (None of this code is new
452 in OVN; this is pre-existing integration work that has already been done
453 on hypervisors that support OVS.)
454 </li>
455
456 <li>
457 On the hypervisor where the VM is powered on, <code>ovn-controller</code>
458 notices <code>external_ids</code>:<code>iface-id</code> in the new
459 Interface. In response, in the OVN Southbound DB, it updates the
460 <code>Binding</code> table's <code>chassis</code> column for the
461 row that links the logical port from <code>external_ids</code>:<code>
462 iface-id</code> to the hypervisor. Afterward, <code>ovn-controller</code>
463 updates the local hypervisor's OpenFlow tables so that packets to and from
464 the VIF are properly handled.
465 </li>
466
467 <li>
468 Some CMS systems, including OpenStack, fully start a VM only when its
469 networking is ready. To support this, <code>ovn-northd</code> notices
470 the <code>chassis</code> column updated for the row in
471 <code>Binding</code> table and pushes this upward by updating the
472 <ref column="up" table="Logical_Switch_Port" db="OVN_NB"/> column
473 in the OVN Northbound database's <ref table="Logical_Switch_Port"
474 db="OVN_NB"/> table to indicate that the VIF is now up. The CMS,
475 if it uses this feature, can then react by allowing the VM's
476 execution to proceed.
477 </li>
478
479 <li>
480 On every hypervisor but the one where the VIF resides,
481 <code>ovn-controller</code> notices the completely populated row in the
482 <code>Binding</code> table. This provides <code>ovn-controller</code>
483 the physical location of the logical port, so each instance updates the
484 OpenFlow tables of its switch (based on logical datapath flows in the OVN
485 DB <code>Logical_Flow</code> table) so that packets to and from the VIF
486 can be properly handled via tunnels.
487 </li>
488
489 <li>
490 Eventually, a user powers off the VM that owns the VIF. On the
491 hypervisor where the VM was powered off, the VIF is deleted from the OVN
492 integration bridge.
493 </li>
494
495 <li>
496 On the hypervisor where the VM was powered off,
497 <code>ovn-controller</code> notices that the VIF was deleted. In
498 response, it removes the <code>Chassis</code> column content in the
499 <code>Binding</code> table for the logical port.
500 </li>
501
502 <li>
503 On every hypervisor, <code>ovn-controller</code> notices the empty
504 <code>Chassis</code> column in the <code>Binding</code> table's row
505 for the logical port. This means that <code>ovn-controller</code> no
506 longer knows the physical location of the logical port, so each instance
507 updates its OpenFlow table to reflect that.
508 </li>
509
510 <li>
511 Eventually, when the VIF (or its entire VM) is no longer needed by
512 anyone, an administrator deletes the VIF using the CMS user interface or
513 API. The CMS updates its own configuration.
514 </li>
515
516 <li>
517 The CMS plugin removes the VIF from the OVN Northbound database,
518 by deleting its row in the <code>Logical_Switch_Port</code> table.
519 </li>
520
521 <li>
522 <code>ovn-northd</code> receives the OVN Northbound update and in turn
523 updates the OVN Southbound database accordingly, by removing or updating
524 the rows from the OVN Southbound database <code>Logical_Flow</code> table
525 and <code>Binding</code> table that were related to the now-destroyed
526 VIF.
527 </li>
528
529 <li>
530 On every hypervisor, <code>ovn-controller</code> receives the
531 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
532 in the previous step. <code>ovn-controller</code> updates OpenFlow
533 tables to reflect the update, although there may not be much to do, since
534 the VIF had already become unreachable when it was removed from the
535 <code>Binding</code> table in a previous step.
536 </li>
537 </ol>
538
539 <h2>Life Cycle of a Container Interface Inside a VM</h2>
540
541 <p>
542 OVN provides virtual network abstractions by converting information
543 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
544 virtual networking for multi-tenants can only be provided if OVN controller
545 is the only entity that can modify flows in Open vSwitch. When the
546 Open vSwitch integration bridge resides in the hypervisor, it is a
547 fair assumption to make that tenant workloads running inside VMs cannot
548 make any changes to Open vSwitch flows.
549 </p>
550
551 <p>
552 If the infrastructure provider trusts the applications inside the
553 containers not to break out and modify the Open vSwitch flows, then
554 containers can be run in hypervisors. This is also the case when
555 containers are run inside the VMs and Open vSwitch integration bridge
556 with flows added by OVN controller resides in the same VM. For both
557 the above cases, the workflow is the same as explained with an example
558 in the previous section ("Life Cycle of a VIF").
559 </p>
560
561 <p>
562 This section talks about the life cycle of a container interface (CIF)
563 when containers are created in the VMs and the Open vSwitch integration
564 bridge resides inside the hypervisor. In this case, even if a container
565 application breaks out, other tenants are not affected because the
566 containers running inside the VMs cannot modify the flows in the
567 Open vSwitch integration bridge.
568 </p>
569
570 <p>
571 When multiple containers are created inside a VM, there are multiple
572 CIFs associated with them. The network traffic associated with these
573 CIFs need to reach the Open vSwitch integration bridge running in the
574 hypervisor for OVN to support virtual network abstractions. OVN should
575 also be able to distinguish network traffic coming from different CIFs.
576 There are two ways to distinguish network traffic of CIFs.
577 </p>
578
579 <p>
580 One way is to provide one VIF for every CIF (1:1 model). This means that
581 there could be a lot of network devices in the hypervisor. This would slow
582 down OVS because of all the additional CPU cycles needed for the management
583 of all the VIFs. It would also mean that the entity creating the
584 containers in a VM should also be able to create the corresponding VIFs in
585 the hypervisor.
586 </p>
587
588 <p>
589 The second way is to provide a single VIF for all the CIFs (1:many model).
590 OVN could then distinguish network traffic coming from different CIFs via
591 a tag written in every packet. OVN uses this mechanism and uses VLAN as
592 the tagging mechanism.
593 </p>
594
595 <ol>
596 <li>
597 A CIF's life cycle begins when a container is spawned inside a VM by
598 the either the same CMS that created the VM or a tenant that owns that VM
599 or even a container Orchestration System that is different than the CMS
600 that initially created the VM. Whoever the entity is, it will need to
601 know the <var>vif-id</var> that is associated with the network interface
602 of the VM through which the container interface's network traffic is
603 expected to go through. The entity that creates the container interface
604 will also need to choose an unused VLAN inside that VM.
605 </li>
606
607 <li>
608 The container spawning entity (either directly or through the CMS that
609 manages the underlying infrastructure) updates the OVN Northbound
610 database to include the new CIF, by adding a row to the
611 <code>Logical_Switch_Port</code> table. In the new row,
612 <code>name</code> is any unique identifier,
613 <code>parent_name</code> is the <var>vif-id</var> of the VM
614 through which the CIF's network traffic is expected to go through
615 and the <code>tag</code> is the VLAN tag that identifies the
616 network traffic of that CIF.
617 </li>
618
619 <li>
620 <code>ovn-northd</code> receives the OVN Northbound database update. In
621 turn, it makes the corresponding updates to the OVN Southbound database,
622 by adding rows to the OVN Southbound database's <code>Logical_Flow</code>
623 table to reflect the new port and also by creating a new row in the
624 <code>Binding</code> table and populating all its columns except the
625 column that identifies the <code>chassis</code>.
626 </li>
627
628 <li>
629 On every hypervisor, <code>ovn-controller</code> subscribes to the
630 changes in the <code>Binding</code> table. When a new row is created
631 by <code>ovn-northd</code> that includes a value in
632 <code>parent_port</code> column of <code>Binding</code> table, the
633 <code>ovn-controller</code> in the hypervisor whose OVN integration bridge
634 has that same value in <var>vif-id</var> in
635 <code>external_ids</code>:<code>iface-id</code>
636 updates the local hypervisor's OpenFlow tables so that packets to and
637 from the VIF with the particular VLAN <code>tag</code> are properly
638 handled. Afterward it updates the <code>chassis</code> column of
639 the <code>Binding</code> to reflect the physical location.
640 </li>
641
642 <li>
643 One can only start the application inside the container after the
644 underlying network is ready. To support this, <code>ovn-northd</code>
645 notices the updated <code>chassis</code> column in <code>Binding</code>
646 table and updates the <ref column="up" table="Logical_Switch_Port"
647 db="OVN_NB"/> column in the OVN Northbound database's
648 <ref table="Logical_Switch_Port" db="OVN_NB"/> table to indicate that the
649 CIF is now up. The entity responsible to start the container application
650 queries this value and starts the application.
651 </li>
652
653 <li>
654 Eventually the entity that created and started the container, stops it.
655 The entity, through the CMS (or directly) deletes its row in the
656 <code>Logical_Switch_Port</code> table.
657 </li>
658
659 <li>
660 <code>ovn-northd</code> receives the OVN Northbound update and in turn
661 updates the OVN Southbound database accordingly, by removing or updating
662 the rows from the OVN Southbound database <code>Logical_Flow</code> table
663 that were related to the now-destroyed CIF. It also deletes the row in
664 the <code>Binding</code> table for that CIF.
665 </li>
666
667 <li>
668 On every hypervisor, <code>ovn-controller</code> receives the
669 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
670 in the previous step. <code>ovn-controller</code> updates OpenFlow
671 tables to reflect the update.
672 </li>
673 </ol>
674
675 <h2>Architectural Physical Life Cycle of a Packet</h2>
676
677 <p>
678 This section describes how a packet travels from one virtual machine or
679 container to another through OVN. This description focuses on the physical
680 treatment of a packet; for a description of the logical life cycle of a
681 packet, please refer to the <code>Logical_Flow</code> table in
682 <code>ovn-sb</code>(5).
683 </p>
684
685 <p>
686 This section mentions several data and metadata fields, for clarity
687 summarized here:
688 </p>
689
690 <dl>
691 <dt>tunnel key</dt>
692 <dd>
693 When OVN encapsulates a packet in Geneve or another tunnel, it attaches
694 extra data to it to allow the receiving OVN instance to process it
695 correctly. This takes different forms depending on the particular
696 encapsulation, but in each case we refer to it here as the ``tunnel
697 key.'' See <code>Tunnel Encapsulations</code>, below, for details.
698 </dd>
699
700 <dt>logical datapath field</dt>
701 <dd>
702 A field that denotes the logical datapath through which a packet is being
703 processed.
704 <!-- Keep the following in sync with MFF_LOG_DATAPATH in
705 ovn/lib/logical-fields.h. -->
706 OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls
707 ``metadata'' to store the logical datapath. (This field is passed across
708 tunnels as part of the tunnel key.)
709 </dd>
710
711 <dt>logical input port field</dt>
712 <dd>
713 <p>
714 A field that denotes the logical port from which the packet
715 entered the logical datapath.
716 <!-- Keep the following in sync with MFF_LOG_INPORT in
717 ovn/lib/logical-fields.h. -->
718 OVN stores this in Open vSwitch extension register number 14.
719 </p>
720
721 <p>
722 Geneve and STT tunnels pass this field as part of the tunnel key.
723 Although VXLAN tunnels do not explicitly carry a logical input port,
724 OVN only uses VXLAN to communicate with gateways that from OVN's
725 perspective consist of only a single logical port, so that OVN can set
726 the logical input port field to this one on ingress to the OVN logical
727 pipeline.
728 </p>
729 </dd>
730
731 <dt>logical output port field</dt>
732 <dd>
733 <p>
734 A field that denotes the logical port from which the packet will
735 leave the logical datapath. This is initialized to 0 at the
736 beginning of the logical ingress pipeline.
737 <!-- Keep the following in sync with MFF_LOG_OUTPORT in
738 ovn/lib/logical-fields.h. -->
739 OVN stores this in Open vSwitch extension register number 15.
740 </p>
741
742 <p>
743 Geneve and STT tunnels pass this field as part of the tunnel key.
744 VXLAN tunnels do not transmit the logical output port field.
745 Since VXLAN tunnels do not carry a logical output port field in
746 the tunnel key, when a packet is received from VXLAN tunnel by
747 an OVN hypervisor, the packet is resubmitted to table 16 to
748 determine the output port(s); when the packet reaches table 32,
749 these packets are resubmitted to table 33 for local delivery by
750 checking a MLF_RCV_FROM_VXLAN flag, which is set when the packet
751 arrives from a VXLAN tunnel.
752 </p>
753 </dd>
754
755 <dt>conntrack zone field for logical ports</dt>
756 <dd>
757 A field that denotes the connection tracking zone for logical ports.
758 The value only has local significance and is not meaningful between
759 chassis. This is initialized to 0 at the beginning of the logical
760 <!-- Keep the following in sync with MFF_LOG_CT_ZONE in
761 ovn/lib/logical-fields.h. -->
762 ingress pipeline. OVN stores this in Open vSwitch extension register
763 number 13.
764 </dd>
765
766 <dt>conntrack zone fields for Gateway router</dt>
767 <dd>
768 Fields that denote the connection tracking zones for Gateway routers.
769 These values only have local significance (only on chassis that have
770 Gateway routers instantiated) and is not meaningful between
771 chassis. OVN stores the zone information for DNATting in Open vSwitch
772 <!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and
773 MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. -->
774 extension register number 11 and zone information for SNATing in
775 Open vSwitch extension register number 12.
776 </dd>
777
778 <dt>logical flow flags</dt>
779 <dd>
780 The logical flags are intended to handle keeping context between
781 tables in order to decide which rules in subsequent tables are
782 matched. These values only have local significance and are not
783 meaningful between chassis. OVN stores the logical flags in
784 <!-- Keep the following in sync with MFF_LOG_FLAGS in
785 ovn/lib/logical-fields.h. -->
786 Open vSwitch extension register number 10.
787 </dd>
788
789 <dt>VLAN ID</dt>
790 <dd>
791 The VLAN ID is used as an interface between OVN and containers nested
792 inside a VM (see <code>Life Cycle of a container interface inside a
793 VM</code>, above, for more information).
794 </dd>
795 </dl>
796
797 <p>
798 Initially, a VM or container on the ingress hypervisor sends a packet on a
799 port attached to the OVN integration bridge. Then:
800 </p>
801
802 <ol>
803 <li>
804 <p>
805 OpenFlow table 0 performs physical-to-logical translation. It matches
806 the packet's ingress port. Its actions annotate the packet with
807 logical metadata, by setting the logical datapath field to identify the
808 logical datapath that the packet is traversing and the logical input
809 port field to identify the ingress port. Then it resubmits to table 16
810 to enter the logical ingress pipeline.
811 </p>
812
813 <p>
814 Packets that originate from a container nested within a VM are treated
815 in a slightly different way. The originating container can be
816 distinguished based on the VIF-specific VLAN ID, so the
817 physical-to-logical translation flows additionally match on VLAN ID and
818 the actions strip the VLAN header. Following this step, OVN treats
819 packets from containers just like any other packets.
820 </p>
821
822 <p>
823 Table 0 also processes packets that arrive from other chassis. It
824 distinguishes them from other packets by ingress port, which is a
825 tunnel. As with packets just entering the OVN pipeline, the actions
826 annotate these packets with logical datapath and logical ingress port
827 metadata. In addition, the actions set the logical output port field,
828 which is available because in OVN tunneling occurs after the logical
829 output port is known. These three pieces of information are obtained
830 from the tunnel encapsulation metadata (see <code>Tunnel
831 Encapsulations</code> for encoding details). Then the actions resubmit
832 to table 33 to enter the logical egress pipeline.
833 </p>
834 </li>
835
836 <li>
837 <p>
838 OpenFlow tables 16 through 31 execute the logical ingress pipeline from
839 the <code>Logical_Flow</code> table in the OVN Southbound database.
840 These tables are expressed entirely in terms of logical concepts like
841 logical ports and logical datapaths. A big part of
842 <code>ovn-controller</code>'s job is to translate them into equivalent
843 OpenFlow (in particular it translates the table numbers:
844 <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16
845 through 31).
846 </p>
847
848 <p>
849 Most OVN actions have fairly obvious implementations in OpenFlow (with
850 OVS extensions), e.g. <code>next;</code> is implemented as
851 <code>resubmit</code>, <code><var>field</var> =
852 <var>constant</var>;</code> as <code>set_field</code>. A few are worth
853 describing in more detail:
854 </p>
855
856 <dl>
857 <dt><code>output:</code></dt>
858 <dd>
859 Implemented by resubmitting the packet to table 32. If the pipeline
860 executes more than one <code>output</code> action, then each one is
861 separately resubmitted to table 32. This can be used to send
862 multiple copies of the packet to multiple ports. (If the packet was
863 not modified between the <code>output</code> actions, and some of the
864 copies are destined to the same hypervisor, then using a logical
865 multicast output port would save bandwidth between hypervisors.)
866 </dd>
867
868 <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt>
869 <dt><code>get_nd(<var>P</var>, <var>A</var>);</code></dt>
870 <dd>
871 <p>
872 Implemented by storing arguments into OpenFlow fields, then
873 resubmitting to table 66, which <code>ovn-controller</code>
874 populates with flows generated from the <code>MAC_Binding</code>
875 table in the OVN Southbound database. If there is a match in table
876 66, then its actions store the bound MAC in the Ethernet
877 destination address field.
878 </p>
879
880 <p>
881 (The OpenFlow actions save and restore the OpenFlow fields used for
882 the arguments, so that the OVN actions do not have to be aware of
883 this temporary use.)
884 </p>
885 </dd>
886
887 <dt><code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt>
888 <dt><code>put_nd(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt>
889 <dd>
890 <p>
891 Implemented by storing the arguments into OpenFlow fields, then
892 outputting a packet to <code>ovn-controller</code>, which updates
893 the <code>MAC_Binding</code> table.
894 </p>
895
896 <p>
897 (The OpenFlow actions save and restore the OpenFlow fields used for
898 the arguments, so that the OVN actions do not have to be aware of
899 this temporary use.)
900 </p>
901 </dd>
902 </dl>
903 </li>
904
905 <li>
906 <p>
907 OpenFlow tables 32 through 47 implement the <code>output</code> action
908 in the logical ingress pipeline. Specifically, table 32 handles
909 packets to remote hypervisors, table 33 handles packets to the local
910 hypervisor, and table 34 checks whether packets whose logical ingress
911 and egress port are the same should be discarded.
912 </p>
913
914 <p>
915 Logical patch ports are a special case. Logical patch ports do not
916 have a physical location and effectively reside on every hypervisor.
917 Thus, flow table 33, for output to ports on the local hypervisor,
918 naturally implements output to unicast logical patch ports too.
919 However, applying the same logic to a logical patch port that is part
920 of a logical multicast group yields packet duplication, because each
921 hypervisor that contains a logical port in the multicast group will
922 also output the packet to the logical patch port. Thus, multicast
923 groups implement output to logical patch ports in table 32.
924 </p>
925
926 <p>
927 Each flow in table 32 matches on a logical output port for unicast or
928 multicast logical ports that include a logical port on a remote
929 hypervisor. Each flow's actions implement sending a packet to the port
930 it matches. For unicast logical output ports on remote hypervisors,
931 the actions set the tunnel key to the correct value, then send the
932 packet on the tunnel port to the correct hypervisor. (When the remote
933 hypervisor receives the packet, table 0 there will recognize it as a
934 tunneled packet and pass it along to table 33.) For multicast logical
935 output ports, the actions send one copy of the packet to each remote
936 hypervisor, in the same way as for unicast destinations. If a
937 multicast group includes a logical port or ports on the local
938 hypervisor, then its actions also resubmit to table 33. Table 32 also
939 includes a fallback flow that resubmits to table 33 if there is no
940 other match. Table 32 also contains a higher priority rule to match
941 packets received from VXLAN tunnels, based on flag MLF_RCV_FROM_VXLAN
942 and resubmit these packets to table 33 for local delivery. Packets
943 received from VXLAN tunnels reach here because of a lack of logical
944 output port field in the tunnel key and thus these packets needed to
945 be submitted to table 16 to determine the output port.
946 </p>
947
948 <p>
949 Flows in table 33 resemble those in table 32 but for logical ports that
950 reside locally rather than remotely. For unicast logical output ports
951 on the local hypervisor, the actions just resubmit to table 34. For
952 multicast output ports that include one or more logical ports on the
953 local hypervisor, for each such logical port <var>P</var>, the actions
954 change the logical output port to <var>P</var>, then resubmit to table
955 34.
956 </p>
957
958 <p>
959 A special case is that when a localnet port exists on the datapath,
960 remote port is connected by switching to the localnet port. In this
961 case, instead of adding a flow in table 32 to reach the remote port, a
962 flow is added in table 33 to switch the logical outport to the localnet
963 port, and resubmit to table 33 as if it were unicasted to a logical
964 port on the local hypervisor.
965 </p>
966
967 <p>
968 Table 34 matches and drops packets for which the logical input and
969 output ports are the same and the MLF_ALLOW_LOOPBACK flag is not
970 set. It resubmits other packets to table 48.
971 </p>
972 </li>
973
974 <li>
975 <p>
976 OpenFlow tables 48 through 63 execute the logical egress pipeline from
977 the <code>Logical_Flow</code> table in the OVN Southbound database.
978 The egress pipeline can perform a final stage of validation before
979 packet delivery. Eventually, it may execute an <code>output</code>
980 action, which <code>ovn-controller</code> implements by resubmitting to
981 table 64. A packet for which the pipeline never executes
982 <code>output</code> is effectively dropped (although it may have been
983 transmitted through a tunnel across a physical network).
984 </p>
985
986 <p>
987 The egress pipeline cannot change the logical output port or cause
988 further tunneling.
989 </p>
990 </li>
991
992 <li>
993 <p>
994 Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK is set.
995 Logical loopback was handled in table 34, but OpenFlow by default also
996 prevents loopback to the OpenFlow ingress port. Thus, when
997 MLF_ALLOW_LOOPBACK is set, OpenFlow table 64 saves the OpenFlow ingress
998 port, sets it to zero, resubmits to table 65 for logical-to-physical
999 transformation, and then restores the OpenFlow ingress port,
1000 effectively disabling OpenFlow loopback prevents. When
1001 MLF_ALLOW_LOOPBACK is unset, table 64 flow simply resubmits to table
1002 65.
1003 </p>
1004 </li>
1005
1006 <li>
1007 <p>
1008 OpenFlow table 65 performs logical-to-physical translation, the
1009 opposite of table 0. It matches the packet's logical egress port. Its
1010 actions output the packet to the port attached to the OVN integration
1011 bridge that represents that logical port. If the logical egress port
1012 is a container nested with a VM, then before sending the packet the
1013 actions push on a VLAN header with an appropriate VLAN ID.
1014 </p>
1015
1016 <p>
1017 If the logical egress port is a logical patch port, then table 65
1018 outputs to an OVS patch port that represents the logical patch port.
1019 The packet re-enters the OpenFlow flow table from the OVS patch port's
1020 peer in table 0, which identifies the logical datapath and logical
1021 input port based on the OVS patch port's OpenFlow port number.
1022 </p>
1023 </li>
1024 </ol>
1025
1026 <h2>Life Cycle of a VTEP gateway</h2>
1027
1028 <p>
1029 A gateway is a chassis that forwards traffic between the OVN-managed
1030 part of a logical network and a physical VLAN, extending a
1031 tunnel-based logical network into a physical network.
1032 </p>
1033
1034 <p>
1035 The steps below refer often to details of the OVN and VTEP database
1036 schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5)
1037 and <code>vtep</code>(5), respectively, for the full story on these
1038 databases.
1039 </p>
1040
1041 <ol>
1042 <li>
1043 A VTEP gateway's life cycle begins with the administrator registering
1044 the VTEP gateway as a <code>Physical_Switch</code> table entry in the
1045 <code>VTEP</code> database. The <code>ovn-controller-vtep</code>
1046 connected to this VTEP database, will recognize the new VTEP gateway
1047 and create a new <code>Chassis</code> table entry for it in the
1048 <code>OVN_Southbound</code> database.
1049 </li>
1050
1051 <li>
1052 The administrator can then create a new <code>Logical_Switch</code>
1053 table entry, and bind a particular vlan on a VTEP gateway's port to
1054 any VTEP logical switch. Once a VTEP logical switch is bound to
1055 a VTEP gateway, the <code>ovn-controller-vtep</code> will detect
1056 it and add its name to the <var>vtep_logical_switches</var>
1057 column of the <code>Chassis</code> table in the <code>
1058 OVN_Southbound</code> database. Note, the <var>tunnel_key</var>
1059 column of VTEP logical switch is not filled at creation. The
1060 <code>ovn-controller-vtep</code> will set the column when the
1061 correponding vtep logical switch is bound to an OVN logical network.
1062 </li>
1063
1064 <li>
1065 Now, the administrator can use the CMS to add a VTEP logical switch
1066 to the OVN logical network. To do that, the CMS must first create a
1067 new <code>Logical_Switch_Port</code> table entry in the <code>
1068 OVN_Northbound</code> database. Then, the <var>type</var> column
1069 of this entry must be set to "vtep". Next, the <var>
1070 vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys
1071 in the <var>options</var> column must also be specified, since
1072 multiple VTEP gateways can attach to the same VTEP logical switch.
1073 </li>
1074
1075 <li>
1076 The newly created logical port in the <code>OVN_Northbound</code>
1077 database and its configuration will be passed down to the <code>
1078 OVN_Southbound</code> database as a new <code>Port_Binding</code>
1079 table entry. The <code>ovn-controller-vtep</code> will recognize the
1080 change and bind the logical port to the corresponding VTEP gateway
1081 chassis. Configuration of binding the same VTEP logical switch to
1082 a different OVN logical networks is not allowed and a warning will be
1083 generated in the log.
1084 </li>
1085
1086 <li>
1087 Beside binding to the VTEP gateway chassis, the <code>
1088 ovn-controller-vtep</code> will update the <var>tunnel_key</var>
1089 column of the VTEP logical switch to the corresponding <code>
1090 Datapath_Binding</code> table entry's <var>tunnel_key</var> for the
1091 bound OVN logical network.
1092 </li>
1093
1094 <li>
1095 Next, the <code>ovn-controller-vtep</code> will keep reacting to the
1096 configuration change in the <code>Port_Binding</code> in the
1097 <code>OVN_Northbound</code> database, and updating the
1098 <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database.
1099 This allows the VTEP gateway to understand where to forward the unicast
1100 traffic coming from the extended external network.
1101 </li>
1102
1103 <li>
1104 Eventually, the VTEP gateway's life cycle ends when the administrator
1105 unregisters the VTEP gateway from the <code>VTEP</code> database.
1106 The <code>ovn-controller-vtep</code> will recognize the event and
1107 remove all related configurations (<code>Chassis</code> table entry
1108 and port bindings) in the <code>OVN_Southbound</code> database.
1109 </li>
1110
1111 <li>
1112 When the <code>ovn-controller-vtep</code> is terminated, all related
1113 configurations in the <code>OVN_Southbound</code> database and
1114 the <code>VTEP</code> database will be cleaned, including
1115 <code>Chassis</code> table entries for all registered VTEP gateways
1116 and their port bindings, and all <code>Ucast_Macs_Remote</code> table
1117 entries and the <code>Logical_Switch</code> tunnel keys.
1118 </li>
1119 </ol>
1120
1121 <h1>Design Decisions</h1>
1122
1123 <h2>Tunnel Encapsulations</h2>
1124
1125 <p>
1126 OVN annotates logical network packets that it sends from one hypervisor to
1127 another with the following three pieces of metadata, which are encoded in
1128 an encapsulation-specific fashion:
1129 </p>
1130
1131 <ul>
1132 <li>
1133 24-bit logical datapath identifier, from the <code>tunnel_key</code>
1134 column in the OVN Southbound <code>Datapath_Binding</code> table.
1135 </li>
1136
1137 <li>
1138 15-bit logical ingress port identifier. ID 0 is reserved for internal
1139 use within OVN. IDs 1 through 32767, inclusive, may be assigned to
1140 logical ports (see the <code>tunnel_key</code> column in the OVN
1141 Southbound <code>Port_Binding</code> table).
1142 </li>
1143
1144 <li>
1145 16-bit logical egress port identifier. IDs 0 through 32767 have the same
1146 meaning as for logical ingress ports. IDs 32768 through 65535,
1147 inclusive, may be assigned to logical multicast groups (see the
1148 <code>tunnel_key</code> column in the OVN Southbound
1149 <code>Multicast_Group</code> table).
1150 </li>
1151 </ul>
1152
1153 <p>
1154 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
1155 encapsulations, for the following reasons:
1156 </p>
1157
1158 <ul>
1159 <li>
1160 Only STT and Geneve support the large amounts of metadata (over 32 bits
1161 per packet) that OVN uses (as described above).
1162 </li>
1163
1164 <li>
1165 STT and Geneve use randomized UDP or TCP source ports that allows
1166 efficient distribution among multiple paths in environments that use ECMP
1167 in their underlay.
1168 </li>
1169
1170 <li>
1171 NICs are available to offload STT and Geneve encapsulation and
1172 decapsulation.
1173 </li>
1174 </ul>
1175
1176 <p>
1177 Due to its flexibility, the preferred encapsulation between hypervisors is
1178 Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1179 identifier in the Geneve VNI.
1180
1181 <!-- Keep the following in sync with ovn/controller/physical.h. -->
1182 OVN transmits the logical ingress and logical egress ports in a TLV with
1183 class 0x0102, type 0, and a 32-bit value encoded as follows, from MSB to
1184 LSB:
1185 </p>
1186
1187 <diagram>
1188 <header name="">
1189 <bits name="rsv" above="1" below="0" width=".25"/>
1190 <bits name="ingress port" above="15" width=".75"/>
1191 <bits name="egress port" above="16" width=".75"/>
1192 </header>
1193 </diagram>
1194
1195 <p>
1196 Environments whose NICs lack Geneve offload may prefer STT encapsulation
1197 for performance reasons. For STT encapsulation, OVN encodes all three
1198 pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
1199 to LSB:
1200 </p>
1201
1202 <diagram>
1203 <header name="">
1204 <bits name="reserved" above="9" below="0" width=".5"/>
1205 <bits name="ingress port" above="15" width=".75"/>
1206 <bits name="egress port" above="16" width=".75"/>
1207 <bits name="datapath" above="24" width="1.25"/>
1208 </header>
1209 </diagram>
1210
1211 <p>
1212 For connecting to gateways, in addition to Geneve and STT, OVN supports
1213 VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches.
1214 Currently, gateways have a feature set that matches the capabilities as
1215 defined by the VTEP schema, so fewer bits of metadata are necessary. In
1216 the future, gateways that do not support encapsulations with large amounts
1217 of metadata may continue to have a reduced feature set.
1218 </p>
1219 </manpage>