]> git.proxmox.com Git - mirror_ovs.git/blob - ovn/ovn-architecture.7.xml
actions: Allow caller to specify output table.
[mirror_ovs.git] / ovn / ovn-architecture.7.xml
1 <?xml version="1.0" encoding="utf-8"?>
2 <manpage program="ovn-architecture" section="7" title="OVN Architecture">
3 <h1>Name</h1>
4 <p>ovn-architecture -- Open Virtual Network architecture</p>
5
6 <h1>Description</h1>
7
8 <p>
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and L3
12 overlays and security groups. Services such as DHCP are also desirable
13 features. Just like OVS, OVN's design goal is to have a production-quality
14 implementation that can operate at significant scale.
15 </p>
16
17 <p>
18 An OVN deployment consists of several components:
19 </p>
20
21 <ul>
22 <li>
23 <p>
24 A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
25 OVN's ultimate client (via its users and administrators). OVN
26 integration requires installing a CMS-specific plugin and
27 related software (see below). OVN initially targets OpenStack
28 as CMS.
29 </p>
30
31 <p>
32 We generally speak of ``the'' CMS, but one can imagine scenarios in
33 which multiple CMSes manage different parts of an OVN deployment.
34 </p>
35 </li>
36
37 <li>
38 An OVN Database physical or virtual node (or, eventually, cluster)
39 installed in a central location.
40 </li>
41
42 <li>
43 One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
44 Open vSwitch and implement the interface described in
45 <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
46 platform supported by Open vSwitch is acceptable.
47 </li>
48
49 <li>
50 <p>
51 Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
52 logical network into a physical network by bidirectionally forwarding
53 packets between tunnels and a physical Ethernet port. This allows
54 non-virtualized machines to participate in logical networks. A gateway
55 may be a physical host, a virtual machine, or an ASIC-based hardware
56 switch that supports the <code>vtep</code>(5) schema. (Support for the
57 latter will come later in OVN implementation.)
58 </p>
59
60 <p>
61 Hypervisors and gateways are together called <dfn>transport node</dfn>
62 or <dfn>chassis</dfn>.
63 </p>
64 </li>
65 </ul>
66
67 <p>
68 The diagram below shows how the major components of OVN and related
69 software interact. Starting at the top of the diagram, we have:
70 </p>
71
72 <ul>
73 <li>
74 The Cloud Management System, as defined above.
75 </li>
76
77 <li>
78 <p>
79 The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
80 interfaces to OVN. In OpenStack, this is a Neutron plugin.
81 The plugin's main purpose is to translate the CMS's notion of logical
82 network configuration, stored in the CMS's configuration database in a
83 CMS-specific format, into an intermediate representation understood by
84 OVN.
85 </p>
86
87 <p>
88 This component is necessarily CMS-specific, so a new plugin needs to be
89 developed for each CMS that is integrated with OVN. All of the
90 components below this one in the diagram are CMS-independent.
91 </p>
92 </li>
93
94 <li>
95 <p>
96 The <dfn>OVN Northbound Database</dfn> receives the intermediate
97 representation of logical network configuration passed down by the
98 OVN/CMS Plugin. The database schema is meant to be ``impedance
99 matched'' with the concepts used in a CMS, so that it directly supports
100 notions of logical switches, routers, ACLs, and so on. See
101 <code>ovs-nb</code>(5) for details.
102 </p>
103
104 <p>
105 The OVN Northbound Database has only two clients: the OVN/CMS Plugin
106 above it and <code>ovn-northd</code> below it.
107 </p>
108 </li>
109
110 <li>
111 <code>ovn-northd</code>(8) connects to the OVN Northbound Database
112 above it and the OVN Southbound Database below it. It translates the
113 logical network configuration in terms of conventional network
114 concepts, taken from the OVN Northbound Database, into logical
115 datapath flows in the OVN Southbound Database below it.
116 </li>
117
118 <li>
119 <p>
120 The <dfn>OVN Southbound Database</dfn> is the center of the system.
121 Its clients are <code>ovn-northd</code>(8) above it and
122 <code>ovn-controller</code>(8) on every transport node below it.
123 </p>
124
125 <p>
126 The OVN Southbound Database contains three kinds of data: <dfn>Physical
127 Network</dfn> (PN) tables that specify how to reach hypervisor and
128 other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
129 logical network in terms of ``logical datapath flows,'' and
130 <dfn>Binding</dfn> tables that link logical network components'
131 locations to the physical network. The hypervisors populate the PN and
132 Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the
133 LN tables.
134 </p>
135
136 <p>
137 OVN Southbound Database performance must scale with the number of
138 transport nodes. This will likely require some work on
139 <code>ovsdb-server</code>(1) as we encounter bottlenecks.
140 Clustering for availability may be needed.
141 </p>
142 </li>
143 </ul>
144
145 <p>
146 The remaining components are replicated onto each hypervisor:
147 </p>
148
149 <ul>
150 <li>
151 <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
152 software gateway. Northbound, it connects to the OVN Southbound
153 Database to learn about OVN configuration and status and to
154 populate the PN table and the <code>Chassis</code> column in
155 <code>Binding</code> table with the hypervisor's status.
156 Southbound, it connects to <code>ovs-vswitchd</code>(8) as an
157 OpenFlow controller, for control over network traffic, and to the
158 local <code>ovsdb-server</code>(1) to allow it to monitor and
159 control Open vSwitch configuration.
160 </li>
161
162 <li>
163 <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
164 conventional components of Open vSwitch.
165 </li>
166 </ul>
167
168 <pre fixed="yes">
169 CMS
170 |
171 |
172 +-----------|-----------+
173 | | |
174 | OVN/CMS Plugin |
175 | | |
176 | | |
177 | OVN Northbound DB |
178 | | |
179 | | |
180 | ovn-northd |
181 | | |
182 +-----------|-----------+
183 |
184 |
185 +-------------------+
186 | OVN Southbound DB |
187 +-------------------+
188 |
189 |
190 +------------------+------------------+
191 | | |
192 HV 1 | | HV n |
193 +---------------|---------------+ . +---------------|---------------+
194 | | | . | | |
195 | ovn-controller | . | ovn-controller |
196 | | | | . | | | |
197 | | | | | | | |
198 | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
199 | | | |
200 +-------------------------------+ +-------------------------------+
201 </pre>
202
203 <h2>Chassis Setup</h2>
204
205 <p>
206 Each chassis in an OVN deployment must be configured with an Open vSwitch
207 bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>.
208 System startup scripts create this bridge prior to starting
209 <code>ovn-controller</code>. The ports on the integration bridge include:
210 </p>
211
212 <ul>
213 <li>
214 On any chassis, tunnel ports that OVN uses to maintain logical network
215 connectivity. <code>ovn-controller</code> adds, updates, and removes
216 these tunnel ports.
217 </li>
218
219 <li>
220 On a hypervisor, any VIFs that are to be attached to logical networks.
221 The hypervisor itself, or the integration between Open vSwitch and the
222 hypervisor (described in <code>IntegrationGuide.md</code>) takes care of
223 this. (This is not part of OVN or new to OVN; this is pre-existing
224 integration work that has already been done on hypervisors that support
225 OVS.)
226 </li>
227
228 <li>
229 On a gateway, the physical port used for logical network connectivity.
230 System startup scripts add this port to the bridge prior to starting
231 <code>ovn-controller</code>. This can be a patch port to another bridge,
232 instead of a physical port, in more sophisticated setups.
233 </li>
234 </ul>
235
236 <p>
237 Other ports should not be attached to the integration bridge. In
238 particular, physical ports attached to the underlay network (as opposed to
239 gateway ports, which are physical ports attached to logical networks) must
240 not be attached to the integration bridge. Underlay physical ports should
241 instead be attached to a separate Open vSwitch bridge (they need not be
242 attached to any bridge at all, in fact).
243 </p>
244
245 <p>
246 The integration bridge should be configured as described below.
247 The effect of each of these settings is documented in
248 <code>ovs-vswitchd.conf.db</code>(5):
249 </p>
250
251 <dl>
252 <dt><code>fail-mode=secure</code></dt>
253 <dd>
254 Avoids switching packets between isolated logical networks before
255 <code>ovn-controller</code> starts up. See <code>Controller Failure
256 Settings</code> in <code>ovs-vsctl</code>(8) for more information.
257 </dd>
258
259 <dt><code>other-config:disable-in-band=true</code></dt>
260 <dd>
261 Suppresses in-band control flows for the integration bridge. It would be
262 unusual for such flows to show up anyway, because OVN uses a local
263 controller (over a Unix domain socket) instead of a remote controller.
264 It's possible, however, for some other bridge in the same system to have
265 an in-band remote controller, and in that case this suppresses the flows
266 that in-band control would ordinarily set up. See <code>In-Band
267 Control</code> in <code>DESIGN.md</code> for more information.
268 </dd>
269 </dl>
270
271 <p>
272 The customary name for the integration bridge is <code>br-int</code>, but
273 another name may be used.
274 </p>
275
276 <h2>Logical Networks</h2>
277
278 <p>
279 A <dfn>logical network</dfn> implements the same concepts as physical
280 networks, but they are insulated from the physical network with tunnels or
281 other encapsulations. This allows logical networks to have separate IP and
282 other address spaces that overlap, without conflicting, with those used for
283 physical networks. Logical network topologies can be arranged without
284 regard for the topologies of the physical networks on which they run.
285 </p>
286
287 <p>
288 Logical network concepts in OVN include:
289 </p>
290
291 <ul>
292 <li>
293 <dfn>Logical switches</dfn>, the logical version of Ethernet switches.
294 </li>
295
296 <li>
297 <dfn>Logical routers</dfn>, the logical version of IP routers. Logical
298 switches and routers can be connected into sophisticated topologies.
299 </li>
300
301 <li>
302 <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow
303 switch. Logical switches and routers are both implemented as logical
304 datapaths.
305 </li>
306 </ul>
307
308 <h2>Life Cycle of a VIF</h2>
309
310 <p>
311 Tables and their schemas presented in isolation are difficult to
312 understand. Here's an example.
313 </p>
314
315 <p>
316 A VIF on a hypervisor is a virtual network interface attached either
317 to a VM or a container running directly on that hypervisor (This is
318 different from the interface of a container running inside a VM).
319 </p>
320
321 <p>
322 The steps in this example refer often to details of the OVN and OVN
323 Northbound database schemas. Please see <code>ovn-sb</code>(5) and
324 <code>ovn-nb</code>(5), respectively, for the full story on these
325 databases.
326 </p>
327
328 <ol>
329 <li>
330 A VIF's life cycle begins when a CMS administrator creates a new VIF
331 using the CMS user interface or API and adds it to a switch (one
332 implemented by OVN as a logical switch). The CMS updates its own
333 configuration. This includes associating unique, persistent identifier
334 <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
335 </li>
336
337 <li>
338 The CMS plugin updates the OVN Northbound database to include the new
339 VIF, by adding a row to the <code>Logical_Port</code> table. In the new
340 row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is
341 <var>mac</var>, <code>switch</code> points to the OVN logical switch's
342 Logical_Switch record, and other columns are initialized appropriately.
343 </li>
344
345 <li>
346 <code>ovn-northd</code> receives the OVN Northbound database update.
347 In turn, it makes the corresponding updates to the OVN Southbound
348 database, by adding rows to the OVN Southbound database
349 <code>Logical_Flow</code> table to reflect the new port, e.g. add a
350 flow to recognize that packets destined to the new port's MAC
351 address should be delivered to it, and update the flow that
352 delivers broadcast and multicast packets to include the new port.
353 It also creates a record in the <code>Binding</code> table and
354 populates all its columns except the column that identifies the
355 <code>chassis</code>.
356 </li>
357
358 <li>
359 On every hypervisor, <code>ovn-controller</code> receives the
360 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
361 in the previous step. As long as the VM that owns the VIF is powered off,
362 <code>ovn-controller</code> cannot do much; it cannot, for example,
363 arrange to send packets to or receive packets from the VIF, because the
364 VIF does not actually exist anywhere.
365 </li>
366
367 <li>
368 Eventually, a user powers on the VM that owns the VIF. On the hypervisor
369 where the VM is powered on, the integration between the hypervisor and
370 Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
371 to the OVN integration bridge and stores <var>vif-id</var> in
372 <code>external-ids</code>:<code>iface-id</code> to indicate that the
373 interface is an instantiation of the new VIF. (None of this code is new
374 in OVN; this is pre-existing integration work that has already been done
375 on hypervisors that support OVS.)
376 </li>
377
378 <li>
379 On the hypervisor where the VM is powered on, <code>ovn-controller</code>
380 notices <code>external-ids</code>:<code>iface-id</code> in the new
381 Interface. In response, it updates the local hypervisor's OpenFlow
382 tables so that packets to and from the VIF are properly handled.
383 Afterward, in the OVN Southbound DB, it updates the
384 <code>Binding</code> table's <code>chassis</code> column for the
385 row that links the logical port from
386 <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
387 </li>
388
389 <li>
390 Some CMS systems, including OpenStack, fully start a VM only when its
391 networking is ready. To support this, <code>ovn-northd</code> notices
392 the <code>chassis</code> column updated for the row in
393 <code>Binding</code> table and pushes this upward by updating the
394 <ref column="up" table="Logical_Port" db="OVN_NB"/> column in the OVN
395 Northbound database's <ref table="Logical_Port" db="OVN_NB"/> table to
396 indicate that the VIF is now up. The CMS, if it uses this feature, can
397 then
398 react by allowing the VM's execution to proceed.
399 </li>
400
401 <li>
402 On every hypervisor but the one where the VIF resides,
403 <code>ovn-controller</code> notices the completely populated row in the
404 <code>Binding</code> table. This provides <code>ovn-controller</code>
405 the physical location of the logical port, so each instance updates the
406 OpenFlow tables of its switch (based on logical datapath flows in the OVN
407 DB <code>Logical_Flow</code> table) so that packets to and from the VIF can
408 be properly handled via tunnels.
409 </li>
410
411 <li>
412 Eventually, a user powers off the VM that owns the VIF. On the
413 hypervisor where the VM was powered off, the VIF is deleted from the OVN
414 integration bridge.
415 </li>
416
417 <li>
418 On the hypervisor where the VM was powered off,
419 <code>ovn-controller</code> notices that the VIF was deleted. In
420 response, it removes the <code>Chassis</code> column content in the
421 <code>Binding</code> table for the logical port.
422 </li>
423
424 <li>
425 On every hypervisor, <code>ovn-controller</code> notices the empty
426 <code>Chassis</code> column in the <code>Binding</code> table's row
427 for the logical port. This means that <code>ovn-controller</code> no
428 longer knows the physical location of the logical port, so each instance
429 updates its OpenFlow table to reflect that.
430 </li>
431
432 <li>
433 Eventually, when the VIF (or its entire VM) is no longer needed by
434 anyone, an administrator deletes the VIF using the CMS user interface or
435 API. The CMS updates its own configuration.
436 </li>
437
438 <li>
439 The CMS plugin removes the VIF from the OVN Northbound database,
440 by deleting its row in the <code>Logical_Port</code> table.
441 </li>
442
443 <li>
444 <code>ovn-northd</code> receives the OVN Northbound update and in turn
445 updates the OVN Southbound database accordingly, by removing or
446 updating the rows from the OVN Southbound database
447 <code>Logical_Flow</code> table and <code>Binding</code> table that
448 were related to the now-destroyed VIF.
449 </li>
450
451 <li>
452 On every hypervisor, <code>ovn-controller</code> receives the
453 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
454 in the previous step. <code>ovn-controller</code> updates OpenFlow tables
455 to reflect the update, although there may not be much to do, since the VIF
456 had already become unreachable when it was removed from the
457 <code>Binding</code> table in a previous step.
458 </li>
459 </ol>
460
461 <h2>Life Cycle of a container interface inside a VM</h2>
462
463 <p>
464 OVN provides virtual network abstractions by converting information
465 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
466 virtual networking for multi-tenants can only be provided if OVN controller
467 is the only entity that can modify flows in Open vSwitch. When the
468 Open vSwitch integration bridge resides in the hypervisor, it is a
469 fair assumption to make that tenant workloads running inside VMs cannot
470 make any changes to Open vSwitch flows.
471 </p>
472
473 <p>
474 If the infrastructure provider trusts the applications inside the
475 containers not to break out and modify the Open vSwitch flows, then
476 containers can be run in hypervisors. This is also the case when
477 containers are run inside the VMs and Open vSwitch integration bridge
478 with flows added by OVN controller resides in the same VM. For both
479 the above cases, the workflow is the same as explained with an example
480 in the previous section ("Life Cycle of a VIF").
481 </p>
482
483 <p>
484 This section talks about the life cycle of a container interface (CIF)
485 when containers are created in the VMs and the Open vSwitch integration
486 bridge resides inside the hypervisor. In this case, even if a container
487 application breaks out, other tenants are not affected because the
488 containers running inside the VMs cannot modify the flows in the
489 Open vSwitch integration bridge.
490 </p>
491
492 <p>
493 When multiple containers are created inside a VM, there are multiple
494 CIFs associated with them. The network traffic associated with these
495 CIFs need to reach the Open vSwitch integration bridge running in the
496 hypervisor for OVN to support virtual network abstractions. OVN should
497 also be able to distinguish network traffic coming from different CIFs.
498 There are two ways to distinguish network traffic of CIFs.
499 </p>
500
501 <p>
502 One way is to provide one VIF for every CIF (1:1 model). This means that
503 there could be a lot of network devices in the hypervisor. This would slow
504 down OVS because of all the additional CPU cycles needed for the management
505 of all the VIFs. It would also mean that the entity creating the
506 containers in a VM should also be able to create the corresponding VIFs in
507 the hypervisor.
508 </p>
509
510 <p>
511 The second way is to provide a single VIF for all the CIFs (1:many model).
512 OVN could then distinguish network traffic coming from different CIFs via
513 a tag written in every packet. OVN uses this mechanism and uses VLAN as
514 the tagging mechanism.
515 </p>
516
517 <ol>
518 <li>
519 A CIF's life cycle begins when a container is spawned inside a VM by
520 the either the same CMS that created the VM or a tenant that owns that VM
521 or even a container Orchestration System that is different than the CMS
522 that initially created the VM. Whoever the entity is, it will need to
523 know the <var>vif-id</var> that is associated with the network interface
524 of the VM through which the container interface's network traffic is
525 expected to go through. The entity that creates the container interface
526 will also need to choose an unused VLAN inside that VM.
527 </li>
528
529 <li>
530 The container spawning entity (either directly or through the CMS that
531 manages the underlying infrastructure) updates the OVN Northbound
532 database to include the new CIF, by adding a row to the
533 <code>Logical_Port</code> table. In the new row, <code>name</code> is
534 any unique identifier, <code>parent_name</code> is the <var>vif-id</var>
535 of the VM through which the CIF's network traffic is expected to go
536 through and the <code>tag</code> is the VLAN tag that identifies the
537 network traffic of that CIF.
538 </li>
539
540 <li>
541 <code>ovn-northd</code> receives the OVN Northbound database update.
542 In turn, it makes the corresponding updates to the OVN Southbound
543 database, by adding rows to the OVN Southbound database's
544 <code>Logical_Flow</code> table to reflect the new port and also by
545 creating a new row in the <code>Binding</code> table and
546 populating all its columns except the column that identifies the
547 <code>chassis</code>.
548 </li>
549
550 <li>
551 On every hypervisor, <code>ovn-controller</code> subscribes to the
552 changes in the <code>Binding</code> table. When a new row is created
553 by <code>ovn-northd</code> that includes a value in
554 <code>parent_port</code> column of <code>Binding</code> table, the
555 <code>ovn-controller</code> in the hypervisor whose OVN integration bridge
556 has that same value in <var>vif-id</var> in
557 <code>external-ids</code>:<code>iface-id</code>
558 updates the local hypervisor's OpenFlow tables so that packets to and
559 from the VIF with the particular VLAN <code>tag</code> are properly
560 handled. Afterward it updates the <code>chassis</code> column of
561 the <code>Binding</code> to reflect the physical location.
562 </li>
563
564 <li>
565 One can only start the application inside the container after the
566 underlying network is ready. To support this, <code>ovn-northd</code>
567 notices the updated <code>chassis</code> column in <code>Binding</code>
568 table and updates the <ref column="up" table="Logical_Port"
569 db="OVN_NB"/> column in the OVN Northbound database's
570 <ref table="Logical_Port" db="OVN_NB"/> table to indicate that the
571 CIF is now up. The entity responsible to start the container application
572 queries this value and starts the application.
573 </li>
574
575 <li>
576 Eventually the entity that created and started the container, stops it.
577 The entity, through the CMS (or directly) deletes its row in the
578 <code>Logical_Port</code> table.
579 </li>
580
581 <li>
582 <code>ovn-northd</code> receives the OVN Northbound update and in turn
583 updates the OVN Southbound database accordingly, by removing or
584 updating the rows from the OVN Southbound database
585 <code>Logical_Flow</code> table that were related to the now-destroyed
586 CIF. It also deletes the row in the <code>Binding</code> table
587 for that CIF.
588 </li>
589
590 <li>
591 On every hypervisor, <code>ovn-controller</code> receives the
592 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
593 in the previous step. <code>ovn-controller</code> updates OpenFlow
594 tables to reflect the update.
595 </li>
596 </ol>
597
598 <h1>Design Decisions</h1>
599
600 <h2>Supported Tunnel Encapsulations</h2>
601 <p>
602 For connecting hypervisors to each other, the only supported tunnel
603 encapsulations are Geneve and STT. Hypervisors may use VXLAN to
604 connect to gateways. We have limited support to these encapsulations
605 for the following reasons:
606 </p>
607
608 <ul>
609 <li>
610 <p>
611 They support large amounts of metadata. In addition to
612 specifying the logical switch, we will likely want to indicate
613 the logical source port and where we are in the logical
614 pipeline. Geneve supports a 24-bit VNI field and TLV-based
615 extensions. The header of STT includes a 64-bit context id.
616 </p>
617 </li>
618
619 <li>
620 <p>
621 They use randomized UDP or TCP source ports that allows
622 efficient distribution among multiple paths in environments that
623 use ECMP in their underlay.
624 </p>
625 </li>
626
627 <li>
628 <p>
629 NICs are available that accelerate encapsulation and decapsulation.
630 </p>
631 </li>
632 </ul>
633
634 <p>
635 Due to its flexibility, the preferred encapsulation between
636 hypervisors is Geneve. Some environments may want to use STT for
637 performance reasons until the NICs they use support hardware offload
638 of Geneve.
639 </p>
640
641 <p>
642 For connecting to gateways, the only supported tunnel encapsulations
643 are VXLAN, Geneve, and STT. While support for Geneve is becoming
644 available for TOR (top-of-rack) switches, VXLAN is far more common.
645 Currently, gateways have a feature set that matches the capabilities
646 as defined by the VTEP schema, so fewer bits of metadata are
647 necessary. In the future, gateways that do not support
648 encapsulations with large amounts of metadata may continue to have a
649 reduced feature set.
650 </p>
651 </manpage>