]> git.proxmox.com Git - mirror_ovs.git/blame - ovn/ovn-architecture.7.xml
ovn-sbctl: Avoid cast in lflow_cmp().
[mirror_ovs.git] / ovn / ovn-architecture.7.xml
CommitLineData
fe36184b
BP
1<?xml version="1.0" encoding="utf-8"?>
2<manpage program="ovn-architecture" section="7" title="OVN Architecture">
3 <h1>Name</h1>
4 <p>ovn-architecture -- Open Virtual Network architecture</p>
5
6 <h1>Description</h1>
7
8 <p>
9 OVN, the Open Virtual Network, is a system to support virtual network
10 abstraction. OVN complements the existing capabilities of OVS to add
11 native support for virtual network abstractions, such as virtual L2 and L3
12 overlays and security groups. Services such as DHCP are also desirable
13 features. Just like OVS, OVN's design goal is to have a production-quality
14 implementation that can operate at significant scale.
15 </p>
16
17 <p>
18 An OVN deployment consists of several components:
19 </p>
20
21 <ul>
22 <li>
23 <p>
24 A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
25 OVN's ultimate client (via its users and administrators). OVN
26 integration requires installing a CMS-specific plugin and
27 related software (see below). OVN initially targets OpenStack
28 as CMS.
29 </p>
30
31 <p>
32 We generally speak of ``the'' CMS, but one can imagine scenarios in
33 which multiple CMSes manage different parts of an OVN deployment.
34 </p>
35 </li>
36
37 <li>
38 An OVN Database physical or virtual node (or, eventually, cluster)
39 installed in a central location.
40 </li>
41
42 <li>
43 One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
44 Open vSwitch and implement the interface described in
45 <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
46 platform supported by Open vSwitch is acceptable.
47 </li>
48
49 <li>
50 <p>
fa6aeaeb
RB
51 Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
52 logical network into a physical network by bidirectionally forwarding
53 packets between tunnels and a physical Ethernet port. This allows
54 non-virtualized machines to participate in logical networks. A gateway
55 may be a physical host, a virtual machine, or an ASIC-based hardware
56 switch that supports the <code>vtep</code>(5) schema. (Support for the
57 latter will come later in OVN implementation.)
fe36184b
BP
58 </p>
59
60 <p>
fa6aeaeb
RB
61 Hypervisors and gateways are together called <dfn>transport node</dfn>
62 or <dfn>chassis</dfn>.
fe36184b
BP
63 </p>
64 </li>
65 </ul>
66
67 <p>
68 The diagram below shows how the major components of OVN and related
69 software interact. Starting at the top of the diagram, we have:
70 </p>
71
72 <ul>
73 <li>
74 The Cloud Management System, as defined above.
75 </li>
76
77 <li>
78 <p>
fa6aeaeb
RB
79 The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
80 interfaces to OVN. In OpenStack, this is a Neutron plugin.
81 The plugin's main purpose is to translate the CMS's notion of logical
82 network configuration, stored in the CMS's configuration database in a
83 CMS-specific format, into an intermediate representation understood by
84 OVN.
fe36184b
BP
85 </p>
86
87 <p>
fa6aeaeb
RB
88 This component is necessarily CMS-specific, so a new plugin needs to be
89 developed for each CMS that is integrated with OVN. All of the
90 components below this one in the diagram are CMS-independent.
fe36184b
BP
91 </p>
92 </li>
93
94 <li>
95 <p>
fa6aeaeb
RB
96 The <dfn>OVN Northbound Database</dfn> receives the intermediate
97 representation of logical network configuration passed down by the
98 OVN/CMS Plugin. The database schema is meant to be ``impedance
99 matched'' with the concepts used in a CMS, so that it directly supports
100 notions of logical switches, routers, ACLs, and so on. See
5868eb24 101 <code>ovn-nb</code>(5) for details.
fe36184b
BP
102 </p>
103
104 <p>
fa6aeaeb
RB
105 The OVN Northbound Database has only two clients: the OVN/CMS Plugin
106 above it and <code>ovn-northd</code> below it.
fe36184b
BP
107 </p>
108 </li>
109
110 <li>
91ae2065
RB
111 <code>ovn-northd</code>(8) connects to the OVN Northbound Database
112 above it and the OVN Southbound Database below it. It translates the
ec78987f
JP
113 logical network configuration in terms of conventional network
114 concepts, taken from the OVN Northbound Database, into logical
115 datapath flows in the OVN Southbound Database below it.
fe36184b
BP
116 </li>
117
118 <li>
119 <p>
ec78987f 120 The <dfn>OVN Southbound Database</dfn> is the center of the system.
91ae2065 121 Its clients are <code>ovn-northd</code>(8) above it and
ec78987f 122 <code>ovn-controller</code>(8) on every transport node below it.
fe36184b
BP
123 </p>
124
125 <p>
fa6aeaeb
RB
126 The OVN Southbound Database contains three kinds of data: <dfn>Physical
127 Network</dfn> (PN) tables that specify how to reach hypervisor and
128 other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
129 logical network in terms of ``logical datapath flows,'' and
130 <dfn>Binding</dfn> tables that link logical network components'
131 locations to the physical network. The hypervisors populate the PN and
dcda6e0d
BP
132 Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the
133 LN tables.
fe36184b
BP
134 </p>
135
136 <p>
ec78987f
JP
137 OVN Southbound Database performance must scale with the number of
138 transport nodes. This will likely require some work on
139 <code>ovsdb-server</code>(1) as we encounter bottlenecks.
140 Clustering for availability may be needed.
fe36184b
BP
141 </p>
142 </li>
143 </ul>
144
145 <p>
146 The remaining components are replicated onto each hypervisor:
147 </p>
148
149 <ul>
150 <li>
151 <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
ec78987f
JP
152 software gateway. Northbound, it connects to the OVN Southbound
153 Database to learn about OVN configuration and status and to
154 populate the PN table and the <code>Chassis</code> column in
e387e3e8 155 <code>Binding</code> table with the hypervisor's status.
ec78987f
JP
156 Southbound, it connects to <code>ovs-vswitchd</code>(8) as an
157 OpenFlow controller, for control over network traffic, and to the
158 local <code>ovsdb-server</code>(1) to allow it to monitor and
159 control Open vSwitch configuration.
fe36184b
BP
160 </li>
161
162 <li>
163 <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
164 conventional components of Open vSwitch.
165 </li>
166 </ul>
167
168 <pre fixed="yes">
169 CMS
170 |
171 |
172 +-----------|-----------+
173 | | |
174 | OVN/CMS Plugin |
175 | | |
176 | | |
177 | OVN Northbound DB |
178 | | |
179 | | |
91ae2065 180 | ovn-northd |
fe36184b
BP
181 | | |
182 +-----------|-----------+
183 |
184 |
ec78987f
JP
185 +-------------------+
186 | OVN Southbound DB |
187 +-------------------+
fe36184b
BP
188 |
189 |
190 +------------------+------------------+
191 | | |
ec78987f 192 HV 1 | | HV n |
fe36184b
BP
193+---------------|---------------+ . +---------------|---------------+
194| | | . | | |
195| ovn-controller | . | ovn-controller |
196| | | | . | | | |
197| | | | | | | |
198| ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
199| | | |
200+-------------------------------+ +-------------------------------+
201 </pre>
202
ca1564ec
BP
203 <h2>Chassis Setup</h2>
204
205 <p>
206 Each chassis in an OVN deployment must be configured with an Open vSwitch
207 bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>.
208 System startup scripts create this bridge prior to starting
209 <code>ovn-controller</code>. The ports on the integration bridge include:
210 </p>
211
212 <ul>
213 <li>
214 On any chassis, tunnel ports that OVN uses to maintain logical network
215 connectivity. <code>ovn-controller</code> adds, updates, and removes
216 these tunnel ports.
217 </li>
218
219 <li>
220 On a hypervisor, any VIFs that are to be attached to logical networks.
221 The hypervisor itself, or the integration between Open vSwitch and the
222 hypervisor (described in <code>IntegrationGuide.md</code>) takes care of
223 this. (This is not part of OVN or new to OVN; this is pre-existing
224 integration work that has already been done on hypervisors that support
225 OVS.)
226 </li>
227
228 <li>
229 On a gateway, the physical port used for logical network connectivity.
230 System startup scripts add this port to the bridge prior to starting
231 <code>ovn-controller</code>. This can be a patch port to another bridge,
232 instead of a physical port, in more sophisticated setups.
233 </li>
234 </ul>
235
236 <p>
237 Other ports should not be attached to the integration bridge. In
238 particular, physical ports attached to the underlay network (as opposed to
239 gateway ports, which are physical ports attached to logical networks) must
240 not be attached to the integration bridge. Underlay physical ports should
241 instead be attached to a separate Open vSwitch bridge (they need not be
242 attached to any bridge at all, in fact).
243 </p>
244
245 <p>
a42226f0
BP
246 The integration bridge should be configured as described below.
247 The effect of each of these settings is documented in
248 <code>ovs-vswitchd.conf.db</code>(5):
ca1564ec
BP
249 </p>
250
a42226f0
BP
251 <dl>
252 <dt><code>fail-mode=secure</code></dt>
253 <dd>
254 Avoids switching packets between isolated logical networks before
255 <code>ovn-controller</code> starts up. See <code>Controller Failure
256 Settings</code> in <code>ovs-vsctl</code>(8) for more information.
257 </dd>
258
259 <dt><code>other-config:disable-in-band=true</code></dt>
260 <dd>
261 Suppresses in-band control flows for the integration bridge. It would be
262 unusual for such flows to show up anyway, because OVN uses a local
263 controller (over a Unix domain socket) instead of a remote controller.
264 It's possible, however, for some other bridge in the same system to have
265 an in-band remote controller, and in that case this suppresses the flows
266 that in-band control would ordinarily set up. See <code>In-Band
267 Control</code> in <code>DESIGN.md</code> for more information.
268 </dd>
269 </dl>
270
ca1564ec
BP
271 <p>
272 The customary name for the integration bridge is <code>br-int</code>, but
273 another name may be used.
274 </p>
275
747b2a45
BP
276 <h2>Logical Networks</h2>
277
278 <p>
279 A <dfn>logical network</dfn> implements the same concepts as physical
280 networks, but they are insulated from the physical network with tunnels or
281 other encapsulations. This allows logical networks to have separate IP and
282 other address spaces that overlap, without conflicting, with those used for
283 physical networks. Logical network topologies can be arranged without
284 regard for the topologies of the physical networks on which they run.
285 </p>
286
287 <p>
288 Logical network concepts in OVN include:
289 </p>
290
291 <ul>
292 <li>
293 <dfn>Logical switches</dfn>, the logical version of Ethernet switches.
294 </li>
295
296 <li>
297 <dfn>Logical routers</dfn>, the logical version of IP routers. Logical
298 switches and routers can be connected into sophisticated topologies.
299 </li>
300
301 <li>
302 <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow
303 switch. Logical switches and routers are both implemented as logical
304 datapaths.
305 </li>
306 </ul>
307
ca1564ec 308 <h2>Life Cycle of a VIF</h2>
fe36184b
BP
309
310 <p>
311 Tables and their schemas presented in isolation are difficult to
312 understand. Here's an example.
313 </p>
314
9fb4636f
GS
315 <p>
316 A VIF on a hypervisor is a virtual network interface attached either
317 to a VM or a container running directly on that hypervisor (This is
318 different from the interface of a container running inside a VM).
319 </p>
320
fe36184b
BP
321 <p>
322 The steps in this example refer often to details of the OVN and OVN
ec78987f 323 Northbound database schemas. Please see <code>ovn-sb</code>(5) and
fe36184b
BP
324 <code>ovn-nb</code>(5), respectively, for the full story on these
325 databases.
326 </p>
327
328 <ol>
329 <li>
330 A VIF's life cycle begins when a CMS administrator creates a new VIF
331 using the CMS user interface or API and adds it to a switch (one
332 implemented by OVN as a logical switch). The CMS updates its own
333 configuration. This includes associating unique, persistent identifier
334 <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
335 </li>
336
337 <li>
338 The CMS plugin updates the OVN Northbound database to include the new
339 VIF, by adding a row to the <code>Logical_Port</code> table. In the new
340 row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is
341 <var>mac</var>, <code>switch</code> points to the OVN logical switch's
342 Logical_Switch record, and other columns are initialized appropriately.
343 </li>
344
345 <li>
5868eb24
BP
346 <code>ovn-northd</code> receives the OVN Northbound database update. In
347 turn, it makes the corresponding updates to the OVN Southbound database,
348 by adding rows to the OVN Southbound database <code>Logical_Flow</code>
349 table to reflect the new port, e.g. add a flow to recognize that packets
350 destined to the new port's MAC address should be delivered to it, and
351 update the flow that delivers broadcast and multicast packets to include
352 the new port. It also creates a record in the <code>Binding</code> table
353 and populates all its columns except the column that identifies the
9fb4636f 354 <code>chassis</code>.
fe36184b
BP
355 </li>
356
357 <li>
358 On every hypervisor, <code>ovn-controller</code> receives the
48605550 359 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
5868eb24
BP
360 in the previous step. As long as the VM that owns the VIF is powered
361 off, <code>ovn-controller</code> cannot do much; it cannot, for example,
fe36184b
BP
362 arrange to send packets to or receive packets from the VIF, because the
363 VIF does not actually exist anywhere.
364 </li>
365
366 <li>
367 Eventually, a user powers on the VM that owns the VIF. On the hypervisor
368 where the VM is powered on, the integration between the hypervisor and
369 Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
370 to the OVN integration bridge and stores <var>vif-id</var> in
371 <code>external-ids</code>:<code>iface-id</code> to indicate that the
372 interface is an instantiation of the new VIF. (None of this code is new
373 in OVN; this is pre-existing integration work that has already been done
374 on hypervisors that support OVS.)
375 </li>
376
377 <li>
378 On the hypervisor where the VM is powered on, <code>ovn-controller</code>
379 notices <code>external-ids</code>:<code>iface-id</code> in the new
380 Interface. In response, it updates the local hypervisor's OpenFlow
381 tables so that packets to and from the VIF are properly handled.
a0149f47 382 Afterward, in the OVN Southbound DB, it updates the
e387e3e8 383 <code>Binding</code> table's <code>chassis</code> column for the
a0149f47
JP
384 row that links the logical port from
385 <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
fe36184b
BP
386 </li>
387
388 <li>
389 Some CMS systems, including OpenStack, fully start a VM only when its
91ae2065
RB
390 networking is ready. To support this, <code>ovn-northd</code> notices
391 the <code>chassis</code> column updated for the row in
e387e3e8 392 <code>Binding</code> table and pushes this upward by updating the
91ae2065
RB
393 <ref column="up" table="Logical_Port" db="OVN_NB"/> column in the OVN
394 Northbound database's <ref table="Logical_Port" db="OVN_NB"/> table to
395 indicate that the VIF is now up. The CMS, if it uses this feature, can
396 then
9fb4636f 397 react by allowing the VM's execution to proceed.
fe36184b
BP
398 </li>
399
400 <li>
401 On every hypervisor but the one where the VIF resides,
9fb4636f 402 <code>ovn-controller</code> notices the completely populated row in the
e387e3e8 403 <code>Binding</code> table. This provides <code>ovn-controller</code>
fe36184b
BP
404 the physical location of the logical port, so each instance updates the
405 OpenFlow tables of its switch (based on logical datapath flows in the OVN
5868eb24
BP
406 DB <code>Logical_Flow</code> table) so that packets to and from the VIF
407 can be properly handled via tunnels.
fe36184b
BP
408 </li>
409
410 <li>
411 Eventually, a user powers off the VM that owns the VIF. On the
6eceebf5 412 hypervisor where the VM was powered off, the VIF is deleted from the OVN
fe36184b
BP
413 integration bridge.
414 </li>
415
416 <li>
6eceebf5 417 On the hypervisor where the VM was powered off,
fe36184b 418 <code>ovn-controller</code> notices that the VIF was deleted. In
9fb4636f 419 response, it removes the <code>Chassis</code> column content in the
e387e3e8 420 <code>Binding</code> table for the logical port.
fe36184b
BP
421 </li>
422
423 <li>
9fb4636f 424 On every hypervisor, <code>ovn-controller</code> notices the empty
e387e3e8 425 <code>Chassis</code> column in the <code>Binding</code> table's row
9fb4636f
GS
426 for the logical port. This means that <code>ovn-controller</code> no
427 longer knows the physical location of the logical port, so each instance
428 updates its OpenFlow table to reflect that.
fe36184b
BP
429 </li>
430
431 <li>
432 Eventually, when the VIF (or its entire VM) is no longer needed by
433 anyone, an administrator deletes the VIF using the CMS user interface or
434 API. The CMS updates its own configuration.
435 </li>
436
437 <li>
438 The CMS plugin removes the VIF from the OVN Northbound database,
439 by deleting its row in the <code>Logical_Port</code> table.
440 </li>
441
442 <li>
91ae2065 443 <code>ovn-northd</code> receives the OVN Northbound update and in turn
5868eb24
BP
444 updates the OVN Southbound database accordingly, by removing or updating
445 the rows from the OVN Southbound database <code>Logical_Flow</code> table
446 and <code>Binding</code> table that were related to the now-destroyed
447 VIF.
fe36184b
BP
448 </li>
449
450 <li>
451 On every hypervisor, <code>ovn-controller</code> receives the
48605550 452 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
5868eb24
BP
453 in the previous step. <code>ovn-controller</code> updates OpenFlow
454 tables to reflect the update, although there may not be much to do, since
455 the VIF had already become unreachable when it was removed from the
e387e3e8 456 <code>Binding</code> table in a previous step.
fe36184b
BP
457 </li>
458 </ol>
459
9fb4636f
GS
460 <h2>Life Cycle of a container interface inside a VM</h2>
461
462 <p>
463 OVN provides virtual network abstractions by converting information
464 written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
465 virtual networking for multi-tenants can only be provided if OVN controller
466 is the only entity that can modify flows in Open vSwitch. When the
467 Open vSwitch integration bridge resides in the hypervisor, it is a
468 fair assumption to make that tenant workloads running inside VMs cannot
469 make any changes to Open vSwitch flows.
470 </p>
471
472 <p>
473 If the infrastructure provider trusts the applications inside the
474 containers not to break out and modify the Open vSwitch flows, then
475 containers can be run in hypervisors. This is also the case when
476 containers are run inside the VMs and Open vSwitch integration bridge
477 with flows added by OVN controller resides in the same VM. For both
478 the above cases, the workflow is the same as explained with an example
479 in the previous section ("Life Cycle of a VIF").
480 </p>
481
482 <p>
483 This section talks about the life cycle of a container interface (CIF)
484 when containers are created in the VMs and the Open vSwitch integration
485 bridge resides inside the hypervisor. In this case, even if a container
486 application breaks out, other tenants are not affected because the
487 containers running inside the VMs cannot modify the flows in the
488 Open vSwitch integration bridge.
489 </p>
490
491 <p>
492 When multiple containers are created inside a VM, there are multiple
493 CIFs associated with them. The network traffic associated with these
494 CIFs need to reach the Open vSwitch integration bridge running in the
495 hypervisor for OVN to support virtual network abstractions. OVN should
496 also be able to distinguish network traffic coming from different CIFs.
497 There are two ways to distinguish network traffic of CIFs.
498 </p>
499
500 <p>
501 One way is to provide one VIF for every CIF (1:1 model). This means that
502 there could be a lot of network devices in the hypervisor. This would slow
503 down OVS because of all the additional CPU cycles needed for the management
504 of all the VIFs. It would also mean that the entity creating the
505 containers in a VM should also be able to create the corresponding VIFs in
506 the hypervisor.
507 </p>
508
509 <p>
510 The second way is to provide a single VIF for all the CIFs (1:many model).
511 OVN could then distinguish network traffic coming from different CIFs via
512 a tag written in every packet. OVN uses this mechanism and uses VLAN as
513 the tagging mechanism.
514 </p>
515
516 <ol>
517 <li>
518 A CIF's life cycle begins when a container is spawned inside a VM by
519 the either the same CMS that created the VM or a tenant that owns that VM
520 or even a container Orchestration System that is different than the CMS
521 that initially created the VM. Whoever the entity is, it will need to
522 know the <var>vif-id</var> that is associated with the network interface
523 of the VM through which the container interface's network traffic is
524 expected to go through. The entity that creates the container interface
525 will also need to choose an unused VLAN inside that VM.
526 </li>
527
528 <li>
529 The container spawning entity (either directly or through the CMS that
530 manages the underlying infrastructure) updates the OVN Northbound
531 database to include the new CIF, by adding a row to the
532 <code>Logical_Port</code> table. In the new row, <code>name</code> is
533 any unique identifier, <code>parent_name</code> is the <var>vif-id</var>
534 of the VM through which the CIF's network traffic is expected to go
535 through and the <code>tag</code> is the VLAN tag that identifies the
536 network traffic of that CIF.
537 </li>
538
539 <li>
5868eb24
BP
540 <code>ovn-northd</code> receives the OVN Northbound database update. In
541 turn, it makes the corresponding updates to the OVN Southbound database,
542 by adding rows to the OVN Southbound database's <code>Logical_Flow</code>
543 table to reflect the new port and also by creating a new row in the
544 <code>Binding</code> table and populating all its columns except the
545 column that identifies the <code>chassis</code>.
9fb4636f
GS
546 </li>
547
548 <li>
549 On every hypervisor, <code>ovn-controller</code> subscribes to the
e387e3e8 550 changes in the <code>Binding</code> table. When a new row is created
91ae2065 551 by <code>ovn-northd</code> that includes a value in
e387e3e8 552 <code>parent_port</code> column of <code>Binding</code> table, the
91ae2065
RB
553 <code>ovn-controller</code> in the hypervisor whose OVN integration bridge
554 has that same value in <var>vif-id</var> in
555 <code>external-ids</code>:<code>iface-id</code>
9fb4636f
GS
556 updates the local hypervisor's OpenFlow tables so that packets to and
557 from the VIF with the particular VLAN <code>tag</code> are properly
558 handled. Afterward it updates the <code>chassis</code> column of
e387e3e8 559 the <code>Binding</code> to reflect the physical location.
9fb4636f
GS
560 </li>
561
562 <li>
563 One can only start the application inside the container after the
91ae2065 564 underlying network is ready. To support this, <code>ovn-northd</code>
e387e3e8 565 notices the updated <code>chassis</code> column in <code>Binding</code>
9fb4636f
GS
566 table and updates the <ref column="up" table="Logical_Port"
567 db="OVN_NB"/> column in the OVN Northbound database's
568 <ref table="Logical_Port" db="OVN_NB"/> table to indicate that the
569 CIF is now up. The entity responsible to start the container application
570 queries this value and starts the application.
571 </li>
572
573 <li>
574 Eventually the entity that created and started the container, stops it.
575 The entity, through the CMS (or directly) deletes its row in the
576 <code>Logical_Port</code> table.
577 </li>
578
579 <li>
91ae2065 580 <code>ovn-northd</code> receives the OVN Northbound update and in turn
5868eb24
BP
581 updates the OVN Southbound database accordingly, by removing or updating
582 the rows from the OVN Southbound database <code>Logical_Flow</code> table
583 that were related to the now-destroyed CIF. It also deletes the row in
584 the <code>Binding</code> table for that CIF.
9fb4636f
GS
585 </li>
586
587 <li>
588 On every hypervisor, <code>ovn-controller</code> receives the
48605550
BP
589 <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made
590 in the previous step. <code>ovn-controller</code> updates OpenFlow
591 tables to reflect the update.
9fb4636f
GS
592 </li>
593 </ol>
b705f9ea 594
5868eb24 595 <h2>Life Cycle of a Packet</h2>
b705f9ea 596
b705f9ea 597 <p>
5868eb24
BP
598 This section describes how a packet travels from one virtual machine or
599 container to another through OVN. This description focuses on the physical
600 treatment of a packet; for a description of the logical life cycle of a
601 packet, please refer to the <code>Logical_Flow</code> table in
602 <code>ovn-sb</code>(5).
b705f9ea
JP
603 </p>
604
5868eb24
BP
605 <p>
606 This section mentions several data and metadata fields, for clarity
607 summarized here:
608 </p>
609
610 <dl>
611 <dt>tunnel key</dt>
612 <dd>
613 When OVN encapsulates a packet in Geneve or another tunnel, it attaches
614 extra data to it to allow the receiving OVN instance to process it
615 correctly. This takes different forms depending on the particular
616 encapsulation, but in each case we refer to it here as the ``tunnel
617 key.'' See <code>Tunnel Encapsulations</code>, below, for details.
618 </dd>
619
620 <dt>logical datapath field</dt>
621 <dd>
622 A field that denotes the logical datapath through which a packet is being
623 processed. OVN uses the field that OpenFlow 1.1+ simply (and
624 confusingly) calls ``metadata'' to store the logical datapath. (This
625 field is passed across tunnels as part of the tunnel key.)
626 </dd>
627
628 <dt>logical input port field</dt>
629 <dd>
cd144a41
JP
630 A field that denotes the logical port from which the packet
631 entered the logical datapath. OVN stores this in Nicira extension
632 register number 6. (This field is passed across tunnels as part
633 of the tunnel key.)
5868eb24
BP
634 </dd>
635
636 <dt>logical output port field</dt>
637 <dd>
cd144a41
JP
638 A field that denotes the logical port from which the packet will
639 leave the logical datapath. This is initialized to 0 at the
640 beginning of the logical ingress pipeline. OVN stores this in
641 Nicira extension register number 7. (This field is passed across
642 tunnels as part of the tunnel key.)
5868eb24
BP
643 </dd>
644
645 <dt>VLAN ID</dt>
646 <dd>
647 The VLAN ID is used as an interface between OVN and containers nested
648 inside a VM (see <code>Life Cycle of a container interface inside a
649 VM</code>, above, for more information).
650 </dd>
651 </dl>
652
653 <p>
654 Initially, a VM or container on the ingress hypervisor sends a packet on a
655 port attached to the OVN integration bridge. Then:
656 </p>
657
658 <ol>
b705f9ea
JP
659 <li>
660 <p>
5868eb24
BP
661 OpenFlow table 0 performs physical-to-logical translation. It matches
662 the packet's ingress port. Its actions annotate the packet with
663 logical metadata, by setting the logical datapath field to identify the
664 logical datapath that the packet is traversing and the logical input
665 port field to identify the ingress port. Then it resubmits to table 16
666 to enter the logical ingress pipeline.
667 </p>
668
669 <p>
670 Packets that originate from a container nested within a VM are treated
671 in a slightly different way. The originating container can be
672 distinguished based on the VIF-specific VLAN ID, so the
673 physical-to-logical translation flows additionally match on VLAN ID and
674 the actions strip the VLAN header. Following this step, OVN treats
675 packets from containers just like any other packets.
676 </p>
677
678 <p>
679 Table 0 also processes packets that arrive from other chassis. It
680 distinguishes them from other packets by ingress port, which is a
681 tunnel. As with packets just entering the OVN pipeline, the actions
682 annotate these packets with logical datapath and logical ingress port
683 metadata. In addition, the actions set the logical output port field,
684 which is available because in OVN tunneling occurs after the logical
685 output port is known. These three pieces of information are obtained
686 from the tunnel encapsulation metadata (see <code>Tunnel
687 Encapsulations</code> for encoding details). Then the actions resubmit
688 to table 33 to enter the logical egress pipeline.
b705f9ea
JP
689 </p>
690 </li>
691
692 <li>
693 <p>
5868eb24
BP
694 OpenFlow tables 16 through 31 execute the logical ingress pipeline from
695 the <code>Logical_Flow</code> table in the OVN Southbound database.
696 These tables are expressed entirely in terms of logical concepts like
697 logical ports and logical datapaths. A big part of
698 <code>ovn-controller</code>'s job is to translate them into equivalent
699 OpenFlow (in particular it translates the table numbers:
700 <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16
701 through 31). For a given packet, the logical ingress pipeline
702 eventually executes zero or more <code>output</code> actions:
b705f9ea 703 </p>
5868eb24
BP
704
705 <ul>
706 <li>
707 If the pipeline executes no <code>output</code> actions at all, the
708 packet is effectively dropped.
709 </li>
710
711 <li>
712 Most commonly, the pipeline executes one <code>output</code> action,
713 which <code>ovn-controller</code> implements by resubmitting the
714 packet to table 32.
715 </li>
716
717 <li>
718 If the pipeline can execute more than one <code>output</code> action,
719 then each one is separately resubmitted to table 32. This can be
720 used to send multiple copies of the packet to multiple ports. (If
721 the packet was not modified between the <code>output</code> actions,
722 and some of the copies are destined to the same hypervisor, then
723 using a logical multicast output port would save bandwidth between
724 hypervisors.)
725 </li>
726 </ul>
b705f9ea
JP
727 </li>
728
729 <li>
730 <p>
5868eb24
BP
731 OpenFlow tables 32 through 47 implement the <code>output</code> action
732 in the logical ingress pipeline. Specifically, table 32 handles
733 packets to remote hypervisors, table 33 handles packets to the local
734 hypervisor, and table 34 discards packets whose logical ingress and
735 egress port are the same.
736 </p>
737
738 <p>
739 Each flow in table 32 matches on a logical output port for unicast or
740 multicast logical ports that include a logical port on a remote
741 hypervisor. Each flow's actions implement sending a packet to the port
742 it matches. For unicast logical output ports on remote hypervisors,
743 the actions set the tunnel key to the correct value, then send the
744 packet on the tunnel port to the correct hypervisor. (When the remote
745 hypervisor receives the packet, table 0 there will recognize it as a
746 tunneled packet and pass it along to table 33.) For multicast logical
747 output ports, the actions send one copy of the packet to each remote
748 hypervisor, in the same way as for unicast destinations. If a
749 multicast group includes a logical port or ports on the local
750 hypervisor, then its actions also resubmit to table 33. Table 32 also
751 includes a fallback flow that resubmits to table 33 if there is no
752 other match.
753 </p>
754
755 <p>
756 Flows in table 33 resemble those in table 32 but for logical ports that
757 reside locally rather than remotely. For unicast logical output ports
758 on the local hypervisor, the actions just resubmit to table 34. For
759 multicast output ports that include one or more logical ports on the
760 local hypervisor, for each such logical port <var>P</var>, the actions
761 change the logical output port to <var>P</var>, then resubmit to table
762 34.
763 </p>
764
765 <p>
766 Table 34 matches and drops packets for which the logical input and
767 output ports are the same. It resubmits other packets to table 48.
b705f9ea
JP
768 </p>
769 </li>
5868eb24
BP
770
771 <li>
772 <p>
773 OpenFlow tables 48 through 63 execute the logical egress pipeline from
774 the <code>Logical_Flow</code> table in the OVN Southbound database.
775 The egress pipeline can perform a final stage of validation before
776 packet delivery. Eventually, it may execute an <code>output</code>
777 action, which <code>ovn-controller</code> implements by resubmitting to
778 table 64. A packet for which the pipeline never executes
779 <code>output</code> is effectively dropped (although it may have been
780 transmitted through a tunnel across a physical network).
781 </p>
782
783 <p>
784 The egress pipeline cannot change the logical output port or cause
785 further tunneling.
786 </p>
787 </li>
788
789 <li>
790 <p>
791 OpenFlow table 64 performs logical-to-physical translation, the
792 opposite of table 0. It matches the packet's logical egress port. Its
793 actions output the packet to the port attached to the OVN integration
794 bridge that represents that logical port. If the logical egress port
795 is a container nested with a VM, then before sending the packet the
796 actions push on a VLAN header with an appropriate VLAN ID.
797 </p>
798 </li>
799 </ol>
800
801 <h1>Design Decisions</h1>
802
803 <h2>Tunnel Encapsulations</h2>
804
805 <p>
806 OVN annotates logical network packets that it sends from one hypervisor to
807 another with the following three pieces of metadata, which are encoded in
808 an encapsulation-specific fashion:
809 </p>
810
811 <ul>
812 <li>
813 24-bit logical datapath identifier, from the <code>tunnel_key</code>
814 column in the OVN Southbound <code>Datapath_Binding</code> table.
815 </li>
816
817 <li>
818 15-bit logical ingress port identifier. ID 0 is reserved for internal
819 use within OVN. IDs 1 through 32767, inclusive, may be assigned to
820 logical ports (see the <code>tunnel_key</code> column in the OVN
821 Southbound <code>Port_Binding</code> table).
822 </li>
823
824 <li>
825 16-bit logical egress port identifier. IDs 0 through 32767 have the same
826 meaning as for logical ingress ports. IDs 32768 through 65535,
827 inclusive, may be assigned to logical multicast groups (see the
828 <code>tunnel_key</code> column in the OVN Southbound
829 <code>Multicast_Group</code> table).
830 </li>
b705f9ea
JP
831 </ul>
832
833 <p>
5868eb24
BP
834 For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
835 encapsulations, for the following reasons:
b705f9ea
JP
836 </p>
837
5868eb24
BP
838 <ul>
839 <li>
840 Only STT and Geneve support the large amounts of metadata (over 32 bits
841 per packet) that OVN uses (as described above).
842 </li>
843
844 <li>
845 STT and Geneve use randomized UDP or TCP source ports that allows
846 efficient distribution among multiple paths in environments that use ECMP
847 in their underlay.
848 </li>
849
850 <li>
851 NICs are available to offload STT and Geneve encapsulation and
852 decapsulation.
853 </li>
854 </ul>
855
856 <p>
857 Due to its flexibility, the preferred encapsulation between hypervisors is
858 Geneve. For Geneve encapsulation, OVN transmits the logical datapath
859 identifier in the Geneve VNI.
860
861 <!-- Keep the following in sync with ovn/controller/physical.h. -->
862 OVN transmits the logical ingress and logical egress ports in a TLV with
863 class 0xffff, type 0, and a 32-bit value encoded as follows, from MSB to
864 LSB:
865 </p>
866
867 <diagram>
868 <header name="">
869 <bits name="rsv" above="1" below="0" width=".25"/>
870 <bits name="ingress port" above="15" width=".75"/>
871 <bits name="egress port" above="16" width=".75"/>
872 </header>
873 </diagram>
874
875 <p>
876 Environments whose NICs lack Geneve offload may prefer STT encapsulation
877 for performance reasons. For STT encapsulation, OVN encodes all three
878 pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
879 to LSB:
880 </p>
881
882 <diagram>
883 <header name="">
884 <bits name="reserved" above="9" below="0" width=".5"/>
885 <bits name="ingress port" above="15" width=".75"/>
886 <bits name="egress port" above="16" width=".75"/>
887 <bits name="datapath" above="24" width="1.25"/>
888 </header>
889 </diagram>
890
b705f9ea 891 <p>
5868eb24
BP
892 For connecting to gateways, in addition to Geneve and STT, OVN supports
893 VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches.
894 Currently, gateways have a feature set that matches the capabilities as
895 defined by the VTEP schema, so fewer bits of metadata are necessary. In
896 the future, gateways that do not support encapsulations with large amounts
897 of metadata may continue to have a reduced feature set.
b705f9ea 898 </p>
fe36184b 899</manpage>