]> git.proxmox.com Git - mirror_ovs.git/blame - ovn/ovn-architecture.7.xml
ovn: Replace tabs with spaces and clean up alignment in unit tests.
[mirror_ovs.git] / ovn / ovn-architecture.7.xml
CommitLineData
fe36184b
BP
1<?xml version="1.0" encoding="utf-8"?>
2<manpage program="ovn-architecture" section="7" title="OVN Architecture">
3 <h1>Name</h1>
4 <p>ovn-architecture -- Open Virtual Network architecture</p>
5
6 <h1>Description</h1>
7
8 <p>
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and L3
12 overlays and security groups. Services such as DHCP are also desirable
13 features. Just like OVS, OVN's design goal is to have a production-quality
14 implementation that can operate at significant scale.
15 </p>
16
17 <p>
18 An OVN deployment consists of several components:
19 </p>
20
21 <ul>
22 <li>
23 <p>
24 A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
25 OVN's ultimate client (via its users and administrators). OVN
26 integration requires installing a CMS-specific plugin and
27 related software (see below). OVN initially targets OpenStack
28 as CMS.
29 </p>
30
31 <p>
32 We generally speak of ``the'' CMS, but one can imagine scenarios in
33 which multiple CMSes manage different parts of an OVN deployment.
34 </p>
35 </li>
36
37 <li>
38 An OVN Database physical or virtual node (or, eventually, cluster)
39 installed in a central location.
40 </li>
41
42 <li>
43 One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
44 Open vSwitch and implement the interface described in
45 <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
46 platform supported by Open vSwitch is acceptable.
47 </li>
48
49 <li>
50 <p>
fa6aeaeb
RB
51 Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
52 logical network into a physical network by bidirectionally forwarding
53 packets between tunnels and a physical Ethernet port. This allows
54 non-virtualized machines to participate in logical networks. A gateway
55 may be a physical host, a virtual machine, or an ASIC-based hardware
56 switch that supports the <code>vtep</code>(5) schema. (Support for the
57 latter will come later in OVN implementation.)
fe36184b
BP
58 </p>
59
60 <p>
fa6aeaeb
RB
61 Hypervisors and gateways are together called <dfn>transport node</dfn>
62 or <dfn>chassis</dfn>.
fe36184b
BP
63 </p>
64 </li>
65 </ul>
66
67 <p>
68 The diagram below shows how the major components of OVN and related
69 software interact. Starting at the top of the diagram, we have:
70 </p>
71
72 <ul>
73 <li>
74 The Cloud Management System, as defined above.
75 </li>
76
77 <li>
78 <p>
fa6aeaeb
RB
79 The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
80 interfaces to OVN. In OpenStack, this is a Neutron plugin.
81 The plugin's main purpose is to translate the CMS's notion of logical
82 network configuration, stored in the CMS's configuration database in a
83 CMS-specific format, into an intermediate representation understood by
84 OVN.
fe36184b
BP
85 </p>
86
87 <p>
fa6aeaeb
RB
88 This component is necessarily CMS-specific, so a new plugin needs to be
89 developed for each CMS that is integrated with OVN. All of the
90 components below this one in the diagram are CMS-independent.
fe36184b
BP
91 </p>
92 </li>
93
94 <li>
95 <p>
fa6aeaeb
RB
96 The <dfn>OVN Northbound Database</dfn> receives the intermediate
97 representation of logical network configuration passed down by the
98 OVN/CMS Plugin. The database schema is meant to be ``impedance
99 matched'' with the concepts used in a CMS, so that it directly supports
100 notions of logical switches, routers, ACLs, and so on. See
5868eb24 101 <code>ovn-nb</code>(5) for details.
fe36184b
BP
102 </p>
103
104 <p>
fa6aeaeb
RB
105 The OVN Northbound Database has only two clients: the OVN/CMS Plugin
106 above it and <code>ovn-northd</code> below it.
fe36184b
BP
107 </p>
108 </li>
109
110 <li>
91ae2065
RB
111 <code>ovn-northd</code>(8) connects to the OVN Northbound Database
112 above it and the OVN Southbound Database below it. It translates the
ec78987f
JP
113 logical network configuration in terms of conventional network
114 concepts, taken from the OVN Northbound Database, into logical
115 datapath flows in the OVN Southbound Database below it.
fe36184b
BP
116 </li>
117
118 <li>
119 <p>
ec78987f 120 The <dfn>OVN Southbound Database</dfn> is the center of the system.
91ae2065 121 Its clients are <code>ovn-northd</code>(8) above it and
ec78987f 122 <code>ovn-controller</code>(8) on every transport node below it.
fe36184b
BP
123 </p>
124
125 <p>
fa6aeaeb
RB
126 The OVN Southbound Database contains three kinds of data: <dfn>Physical
127 Network</dfn> (PN) tables that specify how to reach hypervisor and
128 other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
129 logical network in terms of ``logical datapath flows,'' and
130 <dfn>Binding</dfn> tables that link logical network components'
131 locations to the physical network. The hypervisors populate the PN and
dcda6e0d
BP
132 Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the
133 LN tables.
fe36184b
BP
134 </p>
135
136 <p>
ec78987f
JP
137 OVN Southbound Database performance must scale with the number of
138 transport nodes. This will likely require some work on
139 <code>ovsdb-server</code>(1) as we encounter bottlenecks.
140 Clustering for availability may be needed.
fe36184b
BP
141 </p>
142 </li>
143 </ul>
144
145 <p>
146 The remaining components are replicated onto each hypervisor:
147 </p>
148
149 <ul>
150 <li>
151 <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
ec78987f
JP
152 software gateway. Northbound, it connects to the OVN Southbound
153 Database to learn about OVN configuration and status and to
154 populate the PN table and the <code>Chassis</code> column in
e387e3e8 155 <code>Binding</code> table with the hypervisor's status.
ec78987f
JP
156 Southbound, it connects to <code>ovs-vswitchd</code>(8) as an
157 OpenFlow controller, for control over network traffic, and to the
158 local <code>ovsdb-server</code>(1) to allow it to monitor and
159 control Open vSwitch configuration.
fe36184b
BP
160 </li>
161
162 <li>
163 <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
164 conventional components of Open vSwitch.
165 </li>
166 </ul>
167
168 <pre fixed="yes">
169 CMS
170 |
171 |
172 +-----------|-----------+
173 | | |
174 | OVN/CMS Plugin |
175 | | |
176 | | |
177 | OVN Northbound DB |
178 | | |
179 | | |
91ae2065 180 | ovn-northd |
fe36184b
BP
181 | | |
182 +-----------|-----------+
183 |
184 |
ec78987f
JP
185 +-------------------+
186 | OVN Southbound DB |
187 +-------------------+
fe36184b
BP
188 |
189 |
190 +------------------+------------------+
191 | | |
ec78987f 192 HV 1 | | HV n |
fe36184b
BP
193+---------------|---------------+ . +---------------|---------------+
194| | | . | | |
195| ovn-controller | . | ovn-controller |
196| | | | . | | | |
197| | | | | | | |
198| ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
199| | | |
200+-------------------------------+ +-------------------------------+
201 </pre>
202
ca1564ec
BP
203 <h2>Chassis Setup</h2>
204
205 <p>
206 Each chassis in an OVN deployment must be configured with an Open vSwitch
207 bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>.
e43fc07c
RB
208 System startup scripts may create this bridge prior to starting
209 <code>ovn-controller</code> if desired. If this bridge does not exist when
210 ovn-controller starts, it will be created automatically with the default
211 configuration suggested below. The ports on the integration bridge include:
ca1564ec
BP
212 </p>
213
214 <ul>
215 <li>
216 On any chassis, tunnel ports that OVN uses to maintain logical network
217 connectivity. <code>ovn-controller</code> adds, updates, and removes
218 these tunnel ports.
219 </li>
220
221 <li>
222 On a hypervisor, any VIFs that are to be attached to logical networks.
223 The hypervisor itself, or the integration between Open vSwitch and the
224 hypervisor (described in <code>IntegrationGuide.md</code>) takes care of
225 this. (This is not part of OVN or new to OVN; this is pre-existing
226 integration work that has already been done on hypervisors that support
227 OVS.)
228 </li>
229
230 <li>
231 On a gateway, the physical port used for logical network connectivity.
232 System startup scripts add this port to the bridge prior to starting
233 <code>ovn-controller</code>. This can be a patch port to another bridge,
234 instead of a physical port, in more sophisticated setups.
235 </li>
236 </ul>
237
238 <p>
239 Other ports should not be attached to the integration bridge. In
240 particular, physical ports attached to the underlay network (as opposed to
241 gateway ports, which are physical ports attached to logical networks) must
242 not be attached to the integration bridge. Underlay physical ports should
243 instead be attached to a separate Open vSwitch bridge (they need not be
244 attached to any bridge at all, in fact).
245 </p>
246
247 <p>
a42226f0
BP
248 The integration bridge should be configured as described below.
249 The effect of each of these settings is documented in
250 <code>ovs-vswitchd.conf.db</code>(5):
ca1564ec
BP
251 </p>
252
e43fc07c
RB
253 <!-- Keep the following in sync with create_br_int() in
254 ovn/controller/ovn-controller.c. -->
a42226f0
BP
255 <dl>
256 <dt><code>fail-mode=secure</code></dt>
257 <dd>
258 Avoids switching packets between isolated logical networks before
259 <code>ovn-controller</code> starts up. See <code>Controller Failure
260 Settings</code> in <code>ovs-vsctl</code>(8) for more information.
261 </dd>
262
263 <dt><code>other-config:disable-in-band=true</code></dt>
264 <dd>
265 Suppresses in-band control flows for the integration bridge. It would be
266 unusual for such flows to show up anyway, because OVN uses a local
267 controller (over a Unix domain socket) instead of a remote controller.
268 It's possible, however, for some other bridge in the same system to have
269 an in-band remote controller, and in that case this suppresses the flows
270 that in-band control would ordinarily set up. See <code>In-Band
271 Control</code> in <code>DESIGN.md</code> for more information.
272 </dd>
273 </dl>
274
ca1564ec
BP
275 <p>
276 The customary name for the integration bridge is <code>br-int</code>, but
277 another name may be used.
278 </p>
279
747b2a45
BP
280 <h2>Logical Networks</h2>
281
282 <p>
283 A <dfn>logical network</dfn> implements the same concepts as physical
284 networks, but they are insulated from the physical network with tunnels or
285 other encapsulations. This allows logical networks to have separate IP and
286 other address spaces that overlap, without conflicting, with those used for
287 physical networks. Logical network topologies can be arranged without
288 regard for the topologies of the physical networks on which they run.
289 </p>
290
291 <p>
292 Logical network concepts in OVN include:
293 </p>
294
295 <ul>
296 <li>
297 <dfn>Logical switches</dfn>, the logical version of Ethernet switches.
298 </li>
299
300 <li>
301 <dfn>Logical routers</dfn>, the logical version of IP routers. Logical
302 switches and routers can be connected into sophisticated topologies.
303 </li>
304
305 <li>
306 <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow
307 switch. Logical switches and routers are both implemented as logical
308 datapaths.
309 </li>
310 </ul>
311
ca1564ec 312 <h2>Life Cycle of a VIF</h2>
fe36184b
BP
313
314 <p>
315 Tables and their schemas presented in isolation are difficult to
316 understand. Here's an example.
317 </p>
318
9fb4636f
GS
319 <p>
320 A VIF on a hypervisor is a virtual network interface attached either
321 to a VM or a container running directly on that hypervisor (This is
322 different from the interface of a container running inside a VM).
323 </p>
324
fe36184b
BP
325 <p>
326 The steps in this example refer often to details of the OVN and OVN
ec78987f 327 Northbound database schemas. Please see <code>ovn-sb</code>(5) and
fe36184b
BP
328 <code>ovn-nb</code>(5), respectively, for the full story on these
329 databases.
330 </p>
331
332 <ol>
333 <li>
334 A VIF's life cycle begins when a CMS administrator creates a new VIF
335 using the CMS user interface or API and adds it to a switch (one
336 implemented by OVN as a logical switch). The CMS updates its own
337 configuration. This includes associating unique, persistent identifier
338 <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
339 </li>
340
341 <li>
342 The CMS plugin updates the OVN Northbound database to include the new
80f408f4
JP
343 VIF, by adding a row to the <code>Logical_Switch_Port</code>
344 table. In the new row, <code>name</code> is <var>vif-id</var>,
345 <code>mac</code> is <var>mac</var>, <code>switch</code> points to
346 the OVN logical switch's Logical_Switch record, and other columns
347 are initialized appropriately.
fe36184b
BP
348 </li>
349
350 <li>
5868eb24
BP
351 <code>ovn-northd</code> receives the OVN Northbound database update. In
352 turn, it makes the corresponding updates to the OVN Southbound database,
353 by adding rows to the OVN Southbound database <code>Logical_Flow</code>
354 table to reflect the new port, e.g. add a flow to recognize that packets
355 destined to the new port's MAC address should be delivered to it, and
356 update the flow that delivers broadcast and multicast packets to include
357 the new port. It also creates a record in the <code>Binding</code> table
358 and populates all its columns except the column that identifies the
9fb4636f 359 <code>chassis</code>.
fe36184b
BP
360 </li>
361
362 <li>
363 On every hypervisor, <code>ovn-controller</code> receives the
48605550 364 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
5868eb24
BP
365 in the previous step. As long as the VM that owns the VIF is powered
366 off, <code>ovn-controller</code> cannot do much; it cannot, for example,
fe36184b
BP
367 arrange to send packets to or receive packets from the VIF, because the
368 VIF does not actually exist anywhere.
369 </li>
370
371 <li>
372 Eventually, a user powers on the VM that owns the VIF. On the hypervisor
373 where the VM is powered on, the integration between the hypervisor and
374 Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
375 to the OVN integration bridge and stores <var>vif-id</var> in
376 <code>external-ids</code>:<code>iface-id</code> to indicate that the
377 interface is an instantiation of the new VIF. (None of this code is new
378 in OVN; this is pre-existing integration work that has already been done
379 on hypervisors that support OVS.)
380 </li>
381
382 <li>
383 On the hypervisor where the VM is powered on, <code>ovn-controller</code>
384 notices <code>external-ids</code>:<code>iface-id</code> in the new
385 Interface. In response, it updates the local hypervisor's OpenFlow
386 tables so that packets to and from the VIF are properly handled.
a0149f47 387 Afterward, in the OVN Southbound DB, it updates the
e387e3e8 388 <code>Binding</code> table's <code>chassis</code> column for the
a0149f47
JP
389 row that links the logical port from
390 <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
fe36184b
BP
391 </li>
392
393 <li>
394 Some CMS systems, including OpenStack, fully start a VM only when its
91ae2065
RB
395 networking is ready. To support this, <code>ovn-northd</code> notices
396 the <code>chassis</code> column updated for the row in
e387e3e8 397 <code>Binding</code> table and pushes this upward by updating the
80f408f4
JP
398 <ref column="up" table="Logical_Switch_Port" db="OVN_NB"/> column
399 in the OVN Northbound database's <ref table="Logical_Switch_Port"
400 db="OVN_NB"/> table to indicate that the VIF is now up. The CMS,
401 if it uses this feature, can then react by allowing the VM's
402 execution to proceed.
fe36184b
BP
403 </li>
404
405 <li>
406 On every hypervisor but the one where the VIF resides,
9fb4636f 407 <code>ovn-controller</code> notices the completely populated row in the
e387e3e8 408 <code>Binding</code> table. This provides <code>ovn-controller</code>
fe36184b
BP
409 the physical location of the logical port, so each instance updates the
410 OpenFlow tables of its switch (based on logical datapath flows in the OVN
5868eb24
BP
411 DB <code>Logical_Flow</code> table) so that packets to and from the VIF
412 can be properly handled via tunnels.
fe36184b
BP
413 </li>
414
415 <li>
416 Eventually, a user powers off the VM that owns the VIF. On the
6eceebf5 417 hypervisor where the VM was powered off, the VIF is deleted from the OVN
fe36184b
BP
418 integration bridge.
419 </li>
420
421 <li>
6eceebf5 422 On the hypervisor where the VM was powered off,
fe36184b 423 <code>ovn-controller</code> notices that the VIF was deleted. In
9fb4636f 424 response, it removes the <code>Chassis</code> column content in the
e387e3e8 425 <code>Binding</code> table for the logical port.
fe36184b
BP
426 </li>
427
428 <li>
9fb4636f 429 On every hypervisor, <code>ovn-controller</code> notices the empty
e387e3e8 430 <code>Chassis</code> column in the <code>Binding</code> table's row
9fb4636f
GS
431 for the logical port. This means that <code>ovn-controller</code> no
432 longer knows the physical location of the logical port, so each instance
433 updates its OpenFlow table to reflect that.
fe36184b
BP
434 </li>
435
436 <li>
437 Eventually, when the VIF (or its entire VM) is no longer needed by
438 anyone, an administrator deletes the VIF using the CMS user interface or
439 API. The CMS updates its own configuration.
440 </li>
441
442 <li>
443 The CMS plugin removes the VIF from the OVN Northbound database,
80f408f4 444 by deleting its row in the <code>Logical_Switch_Port</code> table.
fe36184b
BP
445 </li>
446
447 <li>
91ae2065 448 <code>ovn-northd</code> receives the OVN Northbound update and in turn
5868eb24
BP
449 updates the OVN Southbound database accordingly, by removing or updating
450 the rows from the OVN Southbound database <code>Logical_Flow</code> table
451 and <code>Binding</code> table that were related to the now-destroyed
452 VIF.
fe36184b
BP
453 </li>
454
455 <li>
456 On every hypervisor, <code>ovn-controller</code> receives the
48605550 457 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
5868eb24
BP
458 in the previous step. <code>ovn-controller</code> updates OpenFlow
459 tables to reflect the update, although there may not be much to do, since
460 the VIF had already become unreachable when it was removed from the
e387e3e8 461 <code>Binding</code> table in a previous step.
fe36184b
BP
462 </li>
463 </ol>
464
a30b56d4 465 <h2>Life Cycle of a Container Interface Inside a VM</h2>
9fb4636f
GS
466
467 <p>
468 OVN provides virtual network abstractions by converting information
469 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
470 virtual networking for multi-tenants can only be provided if OVN controller
471 is the only entity that can modify flows in Open vSwitch. When the
472 Open vSwitch integration bridge resides in the hypervisor, it is a
473 fair assumption to make that tenant workloads running inside VMs cannot
474 make any changes to Open vSwitch flows.
475 </p>
476
477 <p>
478 If the infrastructure provider trusts the applications inside the
479 containers not to break out and modify the Open vSwitch flows, then
480 containers can be run in hypervisors. This is also the case when
481 containers are run inside the VMs and Open vSwitch integration bridge
482 with flows added by OVN controller resides in the same VM. For both
483 the above cases, the workflow is the same as explained with an example
484 in the previous section ("Life Cycle of a VIF").
485 </p>
486
487 <p>
488 This section talks about the life cycle of a container interface (CIF)
489 when containers are created in the VMs and the Open vSwitch integration
490 bridge resides inside the hypervisor. In this case, even if a container
491 application breaks out, other tenants are not affected because the
492 containers running inside the VMs cannot modify the flows in the
493 Open vSwitch integration bridge.
494 </p>
495
496 <p>
497 When multiple containers are created inside a VM, there are multiple
498 CIFs associated with them. The network traffic associated with these
499 CIFs need to reach the Open vSwitch integration bridge running in the
500 hypervisor for OVN to support virtual network abstractions. OVN should
501 also be able to distinguish network traffic coming from different CIFs.
502 There are two ways to distinguish network traffic of CIFs.
503 </p>
504
505 <p>
506 One way is to provide one VIF for every CIF (1:1 model). This means that
507 there could be a lot of network devices in the hypervisor. This would slow
508 down OVS because of all the additional CPU cycles needed for the management
509 of all the VIFs. It would also mean that the entity creating the
510 containers in a VM should also be able to create the corresponding VIFs in
511 the hypervisor.
512 </p>
513
514 <p>
515 The second way is to provide a single VIF for all the CIFs (1:many model).
516 OVN could then distinguish network traffic coming from different CIFs via
517 a tag written in every packet. OVN uses this mechanism and uses VLAN as
518 the tagging mechanism.
519 </p>
520
521 <ol>
522 <li>
523 A CIF's life cycle begins when a container is spawned inside a VM by
524 the either the same CMS that created the VM or a tenant that owns that VM
525 or even a container Orchestration System that is different than the CMS
526 that initially created the VM. Whoever the entity is, it will need to
527 know the <var>vif-id</var> that is associated with the network interface
528 of the VM through which the container interface's network traffic is
529 expected to go through. The entity that creates the container interface
530 will also need to choose an unused VLAN inside that VM.
531 </li>
532
533 <li>
534 The container spawning entity (either directly or through the CMS that
535 manages the underlying infrastructure) updates the OVN Northbound
536 database to include the new CIF, by adding a row to the
80f408f4
JP
537 <code>Logical_Switch_Port</code> table. In the new row,
538 <code>name</code> is any unique identifier,
539 <code>parent_name</code> is the <var>vif-id</var> of the VM
540 through which the CIF's network traffic is expected to go through
541 and the <code>tag</code> is the VLAN tag that identifies the
9fb4636f
GS
542 network traffic of that CIF.
543 </li>
544
545 <li>
5868eb24
BP
546 <code>ovn-northd</code> receives the OVN Northbound database update. In
547 turn, it makes the corresponding updates to the OVN Southbound database,
548 by adding rows to the OVN Southbound database's <code>Logical_Flow</code>
549 table to reflect the new port and also by creating a new row in the
550 <code>Binding</code> table and populating all its columns except the
551 column that identifies the <code>chassis</code>.
9fb4636f
GS
552 </li>
553
554 <li>
555 On every hypervisor, <code>ovn-controller</code> subscribes to the
e387e3e8 556 changes in the <code>Binding</code> table. When a new row is created
91ae2065 557 by <code>ovn-northd</code> that includes a value in
e387e3e8 558 <code>parent_port</code> column of <code>Binding</code> table, the
91ae2065
RB
559 <code>ovn-controller</code> in the hypervisor whose OVN integration bridge
560 has that same value in <var>vif-id</var> in
561 <code>external-ids</code>:<code>iface-id</code>
9fb4636f
GS
562 updates the local hypervisor's OpenFlow tables so that packets to and
563 from the VIF with the particular VLAN <code>tag</code> are properly
564 handled. Afterward it updates the <code>chassis</code> column of
e387e3e8 565 the <code>Binding</code> to reflect the physical location.
9fb4636f
GS
566 </li>
567
568 <li>
569 One can only start the application inside the container after the
91ae2065 570 underlying network is ready. To support this, <code>ovn-northd</code>
e387e3e8 571 notices the updated <code>chassis</code> column in <code>Binding</code>
80f408f4 572 table and updates the <ref column="up" table="Logical_Switch_Port"
9fb4636f 573 db="OVN_NB"/> column in the OVN Northbound database's
80f408f4 574 <ref table="Logical_Switch_Port" db="OVN_NB"/> table to indicate that the
9fb4636f
GS
575 CIF is now up. The entity responsible to start the container application
576 queries this value and starts the application.
577 </li>
578
579 <li>
580 Eventually the entity that created and started the container, stops it.
581 The entity, through the CMS (or directly) deletes its row in the
80f408f4 582 <code>Logical_Switch_Port</code> table.
9fb4636f
GS
583 </li>
584
585 <li>
91ae2065 586 <code>ovn-northd</code> receives the OVN Northbound update and in turn
5868eb24
BP
587 updates the OVN Southbound database accordingly, by removing or updating
588 the rows from the OVN Southbound database <code>Logical_Flow</code> table
589 that were related to the now-destroyed CIF. It also deletes the row in
590 the <code>Binding</code> table for that CIF.
9fb4636f
GS
591 </li>
592
593 <li>
594 On every hypervisor, <code>ovn-controller</code> receives the
48605550
BP
595 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
596 in the previous step. <code>ovn-controller</code> updates OpenFlow
597 tables to reflect the update.
9fb4636f
GS
598 </li>
599 </ol>
b705f9ea 600
69a832cf 601 <h2>Architectural Physical Life Cycle of a Packet</h2>
b705f9ea 602
b705f9ea 603 <p>
5868eb24
BP
604 This section describes how a packet travels from one virtual machine or
605 container to another through OVN. This description focuses on the physical
606 treatment of a packet; for a description of the logical life cycle of a
607 packet, please refer to the <code>Logical_Flow</code> table in
608 <code>ovn-sb</code>(5).
b705f9ea
JP
609 </p>
610
5868eb24
BP
611 <p>
612 This section mentions several data and metadata fields, for clarity
613 summarized here:
614 </p>
615
616 <dl>
617 <dt>tunnel key</dt>
618 <dd>
619 When OVN encapsulates a packet in Geneve or another tunnel, it attaches
620 extra data to it to allow the receiving OVN instance to process it
621 correctly. This takes different forms depending on the particular
622 encapsulation, but in each case we refer to it here as the ``tunnel
623 key.'' See <code>Tunnel Encapsulations</code>, below, for details.
624 </dd>
625
626 <dt>logical datapath field</dt>
627 <dd>
628 A field that denotes the logical datapath through which a packet is being
4103f6d2
BP
629 processed.
630 <!-- Keep the following in sync with MFF_LOG_DATAPATH in
667e2b0b 631 ovn/lib/logical-fields.h. -->
4103f6d2
BP
632 OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls
633 ``metadata'' to store the logical datapath. (This field is passed across
634 tunnels as part of the tunnel key.)
5868eb24
BP
635 </dd>
636
637 <dt>logical input port field</dt>
638 <dd>
37910994
JP
639 <p>
640 A field that denotes the logical port from which the packet
641 entered the logical datapath.
642 <!-- Keep the following in sync with MFF_LOG_INPORT in
667e2b0b 643 ovn/lib/logical-fields.h. -->
37910994
JP
644 OVN stores this in Nicira extension register number 6.
645 </p>
646
647 <p>
648 Geneve and STT tunnels pass this field as part of the tunnel key.
649 Although VXLAN tunnels do not explicitly carry a logical input port,
650 OVN only uses VXLAN to communicate with gateways that from OVN's
651 perspective consist of only a single logical port, so that OVN can set
652 the logical input port field to this one on ingress to the OVN logical
653 pipeline.
654 </p>
5868eb24
BP
655 </dd>
656
657 <dt>logical output port field</dt>
658 <dd>
37910994
JP
659 <p>
660 A field that denotes the logical port from which the packet will
661 leave the logical datapath. This is initialized to 0 at the
662 beginning of the logical ingress pipeline.
663 <!-- Keep the following in sync with MFF_LOG_OUTPORT in
667e2b0b 664 ovn/lib/logical-fields.h. -->
37910994
JP
665 OVN stores this in Nicira extension register number 7.
666 </p>
667
668 <p>
669 Geneve and STT tunnels pass this field as part of the tunnel key.
670 VXLAN tunnels do not transmit the logical output port field.
671 </p>
5868eb24
BP
672 </dd>
673
3bd4ae23 674 <dt>conntrack zone field for logical ports</dt>
78aab811 675 <dd>
3bd4ae23
GS
676 A field that denotes the connection tracking zone for logical ports.
677 The value only has local significance and is not meaningful between
678 chassis. This is initialized to 0 at the beginning of the logical
679 ingress pipeline. OVN stores this in Nicira extension register number 5.
680 </dd>
681
682 <dt>conntrack zone fields for Gateway router</dt>
683 <dd>
684 Fields that denote the connection tracking zones for Gateway routers.
685 These values only have local significance (only on chassis that have
686 Gateway routers instantiated) and is not meaningful between
687 chassis. OVN stores the zone information for DNATting in Nicira
688 extension register number 3 and zone information for SNATing in Nicira
689 extension register number 4.
78aab811
JP
690 </dd>
691
5868eb24
BP
692 <dt>VLAN ID</dt>
693 <dd>
694 The VLAN ID is used as an interface between OVN and containers nested
695 inside a VM (see <code>Life Cycle of a container interface inside a
696 VM</code>, above, for more information).
697 </dd>
698 </dl>
699
700 <p>
701 Initially, a VM or container on the ingress hypervisor sends a packet on a
702 port attached to the OVN integration bridge. Then:
703 </p>
704
705 <ol>
b705f9ea
JP
706 <li>
707 <p>
5868eb24
BP
708 OpenFlow table 0 performs physical-to-logical translation. It matches
709 the packet's ingress port. Its actions annotate the packet with
710 logical metadata, by setting the logical datapath field to identify the
711 logical datapath that the packet is traversing and the logical input
712 port field to identify the ingress port. Then it resubmits to table 16
713 to enter the logical ingress pipeline.
714 </p>
715
716 <p>
717 Packets that originate from a container nested within a VM are treated
718 in a slightly different way. The originating container can be
719 distinguished based on the VIF-specific VLAN ID, so the
720 physical-to-logical translation flows additionally match on VLAN ID and
721 the actions strip the VLAN header. Following this step, OVN treats
722 packets from containers just like any other packets.
723 </p>
724
725 <p>
726 Table 0 also processes packets that arrive from other chassis. It
727 distinguishes them from other packets by ingress port, which is a
728 tunnel. As with packets just entering the OVN pipeline, the actions
729 annotate these packets with logical datapath and logical ingress port
730 metadata. In addition, the actions set the logical output port field,
731 which is available because in OVN tunneling occurs after the logical
732 output port is known. These three pieces of information are obtained
733 from the tunnel encapsulation metadata (see <code>Tunnel
734 Encapsulations</code> for encoding details). Then the actions resubmit
735 to table 33 to enter the logical egress pipeline.
b705f9ea
JP
736 </p>
737 </li>
738
739 <li>
740 <p>
5868eb24
BP
741 OpenFlow tables 16 through 31 execute the logical ingress pipeline from
742 the <code>Logical_Flow</code> table in the OVN Southbound database.
743 These tables are expressed entirely in terms of logical concepts like
744 logical ports and logical datapaths. A big part of
745 <code>ovn-controller</code>'s job is to translate them into equivalent
746 OpenFlow (in particular it translates the table numbers:
747 <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16
0bac7164 748 through 31).
b705f9ea 749 </p>
5868eb24 750
0bac7164
BP
751 <p>
752 Most OVN actions have fairly obvious implementations in OpenFlow (with
753 OVS extensions), e.g. <code>next;</code> is implemented as
754 <code>resubmit</code>, <code><var>field</var> =
755 <var>constant</var>;</code> as <code>set_field</code>. A few are worth
756 describing in more detail:
757 </p>
758
759 <dl>
760 <dt><code>output:</code></dt>
761 <dd>
762 Implemented by resubmitting the packet to table 32. If the pipeline
763 executes more than one <code>output</code> action, then each one is
764 separately resubmitted to table 32. This can be used to send
765 multiple copies of the packet to multiple ports. (If the packet was
766 not modified between the <code>output</code> actions, and some of the
767 copies are destined to the same hypervisor, then using a logical
768 multicast output port would save bandwidth between hypervisors.)
769 </dd>
770
771 <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt>
772 <dd>
773 <p>
774 Implemented by storing arguments into OpenFlow fields, then
775 resubmitting to table 65, which <code>ovn-controller</code>
776 populates with flows generated from the <code>MAC_Binding</code>
777 table in the OVN Southbound database. If there is a match in table
778 65, then its actions store the bound MAC in the Ethernet
779 destination address field.
780 </p>
781
782 <p>
783 (The OpenFlow actions save and restore the OpenFlow fields used for
784 the arguments, so that the OVN actions do not have to be aware of
785 this temporary use.)
786 </p>
787 </dd>
788
789 <dt><code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt>
790 <dd>
791 <p>
792 Implemented by storing the arguments into OpenFlow fields, then
793 outputting a packet to <code>ovn-controller</code>, which updates
794 the <code>MAC_Binding</code> table.
795 </p>
796
797 <p>
798 (The OpenFlow actions save and restore the OpenFlow fields used for
799 the arguments, so that the OVN actions do not have to be aware of
800 this temporary use.)
801 </p>
802 </dd>
803 </dl>
b705f9ea
JP
804 </li>
805
806 <li>
807 <p>
5868eb24
BP
808 OpenFlow tables 32 through 47 implement the <code>output</code> action
809 in the logical ingress pipeline. Specifically, table 32 handles
810 packets to remote hypervisors, table 33 handles packets to the local
811 hypervisor, and table 34 discards packets whose logical ingress and
812 egress port are the same.
813 </p>
814
0b7da177
BP
815 <p>
816 Logical patch ports are a special case. Logical patch ports do not
817 have a physical location and effectively reside on every hypervisor.
818 Thus, flow table 33, for output to ports on the local hypervisor,
819 naturally implements output to unicast logical patch ports too.
820 However, applying the same logic to a logical patch port that is part
821 of a logical multicast group yields packet duplication, because each
822 hypervisor that contains a logical port in the multicast group will
823 also output the packet to the logical patch port. Thus, multicast
824 groups implement output to logical patch ports in table 32.
825 </p>
826
5868eb24
BP
827 <p>
828 Each flow in table 32 matches on a logical output port for unicast or
829 multicast logical ports that include a logical port on a remote
830 hypervisor. Each flow's actions implement sending a packet to the port
831 it matches. For unicast logical output ports on remote hypervisors,
832 the actions set the tunnel key to the correct value, then send the
833 packet on the tunnel port to the correct hypervisor. (When the remote
834 hypervisor receives the packet, table 0 there will recognize it as a
835 tunneled packet and pass it along to table 33.) For multicast logical
836 output ports, the actions send one copy of the packet to each remote
837 hypervisor, in the same way as for unicast destinations. If a
838 multicast group includes a logical port or ports on the local
839 hypervisor, then its actions also resubmit to table 33. Table 32 also
840 includes a fallback flow that resubmits to table 33 if there is no
841 other match.
842 </p>
843
844 <p>
845 Flows in table 33 resemble those in table 32 but for logical ports that
0b7da177 846 reside locally rather than remotely. For unicast logical output ports
5868eb24
BP
847 on the local hypervisor, the actions just resubmit to table 34. For
848 multicast output ports that include one or more logical ports on the
849 local hypervisor, for each such logical port <var>P</var>, the actions
850 change the logical output port to <var>P</var>, then resubmit to table
851 34.
852 </p>
853
6e6c3f91
HZ
854 <p>
855 A special case is that when a localnet port exists on the datapath,
856 remote port is connected by switching to the localnet port. In this
857 case, instead of adding a flow in table 32 to reach the remote port, a
858 flow is added in table 33 to switch the logical outport to the localnet
859 port, and resubmit to table 33 as if it were unicasted to a logical
860 port on the local hypervisor.
861 </p>
862
5868eb24
BP
863 <p>
864 Table 34 matches and drops packets for which the logical input and
865 output ports are the same. It resubmits other packets to table 48.
b705f9ea
JP
866 </p>
867 </li>
5868eb24
BP
868
869 <li>
870 <p>
871 OpenFlow tables 48 through 63 execute the logical egress pipeline from
872 the <code>Logical_Flow</code> table in the OVN Southbound database.
873 The egress pipeline can perform a final stage of validation before
874 packet delivery. Eventually, it may execute an <code>output</code>
875 action, which <code>ovn-controller</code> implements by resubmitting to
876 table 64. A packet for which the pipeline never executes
877 <code>output</code> is effectively dropped (although it may have been
878 transmitted through a tunnel across a physical network).
879 </p>
880
881 <p>
882 The egress pipeline cannot change the logical output port or cause
883 further tunneling.
884 </p>
885 </li>
886
887 <li>
888 <p>
889 OpenFlow table 64 performs logical-to-physical translation, the
890 opposite of table 0. It matches the packet's logical egress port. Its
891 actions output the packet to the port attached to the OVN integration
892 bridge that represents that logical port. If the logical egress port
893 is a container nested with a VM, then before sending the packet the
894 actions push on a VLAN header with an appropriate VLAN ID.
895 </p>
d387d24d
BP
896
897 <p>
898 If the logical egress port is a logical patch port, then table 64
899 outputs to an OVS patch port that represents the logical patch port.
900 The packet re-enters the OpenFlow flow table from the OVS patch port's
901 peer in table 0, which identifies the logical datapath and logical
902 input port based on the OVS patch port's OpenFlow port number.
903 </p>
5868eb24
BP
904 </li>
905 </ol>
906
88058f19
AW
907 <h2>Life Cycle of a VTEP gateway</h2>
908
909 <p>
910 A gateway is a chassis that forwards traffic between the OVN-managed
911 part of a logical network and a physical VLAN, extending a
912 tunnel-based logical network into a physical network.
913 </p>
914
915 <p>
916 The steps below refer often to details of the OVN and VTEP database
917 schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5)
918 and <code>vtep</code>(5), respectively, for the full story on these
919 databases.
920 </p>
921
922 <ol>
923 <li>
924 A VTEP gateway's life cycle begins with the administrator registering
925 the VTEP gateway as a <code>Physical_Switch</code> table entry in the
926 <code>VTEP</code> database. The <code>ovn-controller-vtep</code>
927 connected to this VTEP database, will recognize the new VTEP gateway
928 and create a new <code>Chassis</code> table entry for it in the
929 <code>OVN_Southbound</code> database.
930 </li>
931
932 <li>
933 The administrator can then create a new <code>Logical_Switch</code>
934 table entry, and bind a particular vlan on a VTEP gateway's port to
935 any VTEP logical switch. Once a VTEP logical switch is bound to
936 a VTEP gateway, the <code>ovn-controller-vtep</code> will detect
937 it and add its name to the <var>vtep_logical_switches</var>
938 column of the <code>Chassis</code> table in the <code>
939 OVN_Southbound</code> database. Note, the <var>tunnel_key</var>
940 column of VTEP logical switch is not filled at creation. The
941 <code>ovn-controller-vtep</code> will set the column when the
942 correponding vtep logical switch is bound to an OVN logical network.
943 </li>
944
945 <li>
946 Now, the administrator can use the CMS to add a VTEP logical switch
947 to the OVN logical network. To do that, the CMS must first create a
80f408f4 948 new <code>Logical_Switch_Port</code> table entry in the <code>
88058f19
AW
949 OVN_Northbound</code> database. Then, the <var>type</var> column
950 of this entry must be set to "vtep". Next, the <var>
951 vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys
952 in the <var>options</var> column must also be specified, since
953 multiple VTEP gateways can attach to the same VTEP logical switch.
954 </li>
955
956 <li>
957 The newly created logical port in the <code>OVN_Northbound</code>
958 database and its configuration will be passed down to the <code>
959 OVN_Southbound</code> database as a new <code>Port_Binding</code>
960 table entry. The <code>ovn-controller-vtep</code> will recognize the
961 change and bind the logical port to the corresponding VTEP gateway
962 chassis. Configuration of binding the same VTEP logical switch to
963 a different OVN logical networks is not allowed and a warning will be
964 generated in the log.
965 </li>
966
967 <li>
968 Beside binding to the VTEP gateway chassis, the <code>
969 ovn-controller-vtep</code> will update the <var>tunnel_key</var>
970 column of the VTEP logical switch to the corresponding <code>
971 Datapath_Binding</code> table entry's <var>tunnel_key</var> for the
972 bound OVN logical network.
973 </li>
974
975 <li>
976 Next, the <code>ovn-controller-vtep</code> will keep reacting to the
977 configuration change in the <code>Port_Binding</code> in the
978 <code>OVN_Northbound</code> database, and updating the
979 <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database.
980 This allows the VTEP gateway to understand where to forward the unicast
981 traffic coming from the extended external network.
982 </li>
983
984 <li>
985 Eventually, the VTEP gateway's life cycle ends when the administrator
986 unregisters the VTEP gateway from the <code>VTEP</code> database.
987 The <code>ovn-controller-vtep</code> will recognize the event and
988 remove all related configurations (<code>Chassis</code> table entry
989 and port bindings) in the <code>OVN_Southbound</code> database.
990 </li>
991
992 <li>
993 When the <code>ovn-controller-vtep</code> is terminated, all related
994 configurations in the <code>OVN_Southbound</code> database and
995 the <code>VTEP</code> database will be cleaned, including
996 <code>Chassis</code> table entries for all registered VTEP gateways
997 and their port bindings, and all <code>Ucast_Macs_Remote</code> table
998 entries and the <code>Logical_Switch</code> tunnel keys.
999 </li>
1000 </ol>
1001
5868eb24
BP
1002 <h1>Design Decisions</h1>
1003
1004 <h2>Tunnel Encapsulations</h2>
1005
1006 <p>
1007 OVN annotates logical network packets that it sends from one hypervisor to
1008 another with the following three pieces of metadata, which are encoded in
1009 an encapsulation-specific fashion:
1010 </p>
1011
1012 <ul>
1013 <li>
1014 24-bit logical datapath identifier, from the <code>tunnel_key</code>
1015 column in the OVN Southbound <code>Datapath_Binding</code> table.
1016 </li>
1017
1018 <li>
1019 15-bit logical ingress port identifier. ID 0 is reserved for internal
1020 use within OVN. IDs 1 through 32767, inclusive, may be assigned to
1021 logical ports (see the <code>tunnel_key</code> column in the OVN
1022 Southbound <code>Port_Binding</code> table).
1023 </li>
1024
1025 <li>
1026 16-bit logical egress port identifier. IDs 0 through 32767 have the same
1027 meaning as for logical ingress ports. IDs 32768 through 65535,
1028 inclusive, may be assigned to logical multicast groups (see the
1029 <code>tunnel_key</code> column in the OVN Southbound
1030 <code>Multicast_Group</code> table).
1031 </li>
b705f9ea
JP
1032 </ul>
1033
1034 <p>
5868eb24
BP
1035 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
1036 encapsulations, for the following reasons:
b705f9ea
JP
1037 </p>
1038
5868eb24
BP
1039 <ul>
1040 <li>
1041 Only STT and Geneve support the large amounts of metadata (over 32 bits
1042 per packet) that OVN uses (as described above).
1043 </li>
1044
1045 <li>
1046 STT and Geneve use randomized UDP or TCP source ports that allows
1047 efficient distribution among multiple paths in environments that use ECMP
1048 in their underlay.
1049 </li>
1050
1051 <li>
1052 NICs are available to offload STT and Geneve encapsulation and
1053 decapsulation.
1054 </li>
1055 </ul>
1056
1057 <p>
1058 Due to its flexibility, the preferred encapsulation between hypervisors is
1059 Geneve. For Geneve encapsulation, OVN transmits the logical datapath
1060 identifier in the Geneve VNI.
1061
1062 <!-- Keep the following in sync with ovn/controller/physical.h. -->
1063 OVN transmits the logical ingress and logical egress ports in a TLV with
57d44532 1064 class 0x0102, type 0, and a 32-bit value encoded as follows, from MSB to
5868eb24
BP
1065 LSB:
1066 </p>
1067
1068 <diagram>
1069 <header name="">
1070 <bits name="rsv" above="1" below="0" width=".25"/>
1071 <bits name="ingress port" above="15" width=".75"/>
1072 <bits name="egress port" above="16" width=".75"/>
1073 </header>
1074 </diagram>
1075
1076 <p>
1077 Environments whose NICs lack Geneve offload may prefer STT encapsulation
1078 for performance reasons. For STT encapsulation, OVN encodes all three
1079 pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
1080 to LSB:
1081 </p>
1082
1083 <diagram>
1084 <header name="">
1085 <bits name="reserved" above="9" below="0" width=".5"/>
1086 <bits name="ingress port" above="15" width=".75"/>
1087 <bits name="egress port" above="16" width=".75"/>
1088 <bits name="datapath" above="24" width="1.25"/>
1089 </header>
1090 </diagram>
1091
b705f9ea 1092 <p>
5868eb24
BP
1093 For connecting to gateways, in addition to Geneve and STT, OVN supports
1094 VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches.
1095 Currently, gateways have a feature set that matches the capabilities as
1096 defined by the VTEP schema, so fewer bits of metadata are necessary. In
1097 the future, gateways that do not support encapsulations with large amounts
1098 of metadata may continue to have a reduced feature set.
b705f9ea 1099 </p>
fe36184b 1100</manpage>