]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
45 | <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor | |
46 | platform supported by Open vSwitch is acceptable. | |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
56 | switch that supports the <code>vtep</code>(5) schema. (Support for the | |
57 | latter will come later in OVN implementation.) | |
fe36184b BP |
58 | </p> |
59 | ||
60 | <p> | |
fa6aeaeb RB |
61 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
62 | or <dfn>chassis</dfn>. | |
fe36184b BP |
63 | </p> |
64 | </li> | |
65 | </ul> | |
66 | ||
67 | <p> | |
68 | The diagram below shows how the major components of OVN and related | |
69 | software interact. Starting at the top of the diagram, we have: | |
70 | </p> | |
71 | ||
72 | <ul> | |
73 | <li> | |
74 | The Cloud Management System, as defined above. | |
75 | </li> | |
76 | ||
77 | <li> | |
78 | <p> | |
fa6aeaeb RB |
79 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
80 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
81 | The plugin's main purpose is to translate the CMS's notion of logical | |
82 | network configuration, stored in the CMS's configuration database in a | |
83 | CMS-specific format, into an intermediate representation understood by | |
84 | OVN. | |
fe36184b BP |
85 | </p> |
86 | ||
87 | <p> | |
fa6aeaeb RB |
88 | This component is necessarily CMS-specific, so a new plugin needs to be |
89 | developed for each CMS that is integrated with OVN. All of the | |
90 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
91 | </p> |
92 | </li> | |
93 | ||
94 | <li> | |
95 | <p> | |
fa6aeaeb RB |
96 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
97 | representation of logical network configuration passed down by the | |
98 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
99 | matched'' with the concepts used in a CMS, so that it directly supports | |
100 | notions of logical switches, routers, ACLs, and so on. See | |
101 | <code>ovs-nb</code>(5) for details. | |
fe36184b BP |
102 | </p> |
103 | ||
104 | <p> | |
fa6aeaeb RB |
105 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
106 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
107 | </p> |
108 | </li> | |
109 | ||
110 | <li> | |
91ae2065 RB |
111 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
112 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
113 | logical network configuration in terms of conventional network |
114 | concepts, taken from the OVN Northbound Database, into logical | |
115 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
116 | </li> |
117 | ||
118 | <li> | |
119 | <p> | |
ec78987f | 120 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 121 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 122 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
123 | </p> |
124 | ||
125 | <p> | |
fa6aeaeb RB |
126 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
127 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
128 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
129 | logical network in terms of ``logical datapath flows,'' and | |
130 | <dfn>Binding</dfn> tables that link logical network components' | |
131 | locations to the physical network. The hypervisors populate the PN and | |
132 | Binding tables, whereas <code>ovn-northd</code>(8) populates the LN | |
133 | tables. | |
fe36184b BP |
134 | </p> |
135 | ||
136 | <p> | |
ec78987f JP |
137 | OVN Southbound Database performance must scale with the number of |
138 | transport nodes. This will likely require some work on | |
139 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
140 | Clustering for availability may be needed. | |
fe36184b BP |
141 | </p> |
142 | </li> | |
143 | </ul> | |
144 | ||
145 | <p> | |
146 | The remaining components are replicated onto each hypervisor: | |
147 | </p> | |
148 | ||
149 | <ul> | |
150 | <li> | |
151 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
152 | software gateway. Northbound, it connects to the OVN Southbound |
153 | Database to learn about OVN configuration and status and to | |
154 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 155 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
156 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
157 | OpenFlow controller, for control over network traffic, and to the | |
158 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
159 | control Open vSwitch configuration. | |
fe36184b BP |
160 | </li> |
161 | ||
162 | <li> | |
163 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
164 | conventional components of Open vSwitch. | |
165 | </li> | |
166 | </ul> | |
167 | ||
168 | <pre fixed="yes"> | |
169 | CMS | |
170 | | | |
171 | | | |
172 | +-----------|-----------+ | |
173 | | | | | |
174 | | OVN/CMS Plugin | | |
175 | | | | | |
176 | | | | | |
177 | | OVN Northbound DB | | |
178 | | | | | |
179 | | | | | |
91ae2065 | 180 | | ovn-northd | |
fe36184b BP |
181 | | | | |
182 | +-----------|-----------+ | |
183 | | | |
184 | | | |
ec78987f JP |
185 | +-------------------+ |
186 | | OVN Southbound DB | | |
187 | +-------------------+ | |
fe36184b BP |
188 | | |
189 | | | |
190 | +------------------+------------------+ | |
191 | | | | | |
ec78987f | 192 | HV 1 | | HV n | |
fe36184b BP |
193 | +---------------|---------------+ . +---------------|---------------+ |
194 | | | | . | | | | |
195 | | ovn-controller | . | ovn-controller | | |
196 | | | | | . | | | | | |
197 | | | | | | | | | | |
198 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
199 | | | | | | |
200 | +-------------------------------+ +-------------------------------+ | |
201 | </pre> | |
202 | ||
ca1564ec BP |
203 | <h2>Chassis Setup</h2> |
204 | ||
205 | <p> | |
206 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
207 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
208 | System startup scripts create this bridge prior to starting | |
209 | <code>ovn-controller</code>. The ports on the integration bridge include: | |
210 | </p> | |
211 | ||
212 | <ul> | |
213 | <li> | |
214 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
215 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
216 | these tunnel ports. | |
217 | </li> | |
218 | ||
219 | <li> | |
220 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
221 | The hypervisor itself, or the integration between Open vSwitch and the | |
222 | hypervisor (described in <code>IntegrationGuide.md</code>) takes care of | |
223 | this. (This is not part of OVN or new to OVN; this is pre-existing | |
224 | integration work that has already been done on hypervisors that support | |
225 | OVS.) | |
226 | </li> | |
227 | ||
228 | <li> | |
229 | On a gateway, the physical port used for logical network connectivity. | |
230 | System startup scripts add this port to the bridge prior to starting | |
231 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
232 | instead of a physical port, in more sophisticated setups. | |
233 | </li> | |
234 | </ul> | |
235 | ||
236 | <p> | |
237 | Other ports should not be attached to the integration bridge. In | |
238 | particular, physical ports attached to the underlay network (as opposed to | |
239 | gateway ports, which are physical ports attached to logical networks) must | |
240 | not be attached to the integration bridge. Underlay physical ports should | |
241 | instead be attached to a separate Open vSwitch bridge (they need not be | |
242 | attached to any bridge at all, in fact). | |
243 | </p> | |
244 | ||
245 | <p> | |
a42226f0 BP |
246 | The integration bridge should be configured as described below. |
247 | The effect of each of these settings is documented in | |
248 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
249 | </p> |
250 | ||
a42226f0 BP |
251 | <dl> |
252 | <dt><code>fail-mode=secure</code></dt> | |
253 | <dd> | |
254 | Avoids switching packets between isolated logical networks before | |
255 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
256 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
257 | </dd> | |
258 | ||
259 | <dt><code>other-config:disable-in-band=true</code></dt> | |
260 | <dd> | |
261 | Suppresses in-band control flows for the integration bridge. It would be | |
262 | unusual for such flows to show up anyway, because OVN uses a local | |
263 | controller (over a Unix domain socket) instead of a remote controller. | |
264 | It's possible, however, for some other bridge in the same system to have | |
265 | an in-band remote controller, and in that case this suppresses the flows | |
266 | that in-band control would ordinarily set up. See <code>In-Band | |
267 | Control</code> in <code>DESIGN.md</code> for more information. | |
268 | </dd> | |
269 | </dl> | |
270 | ||
ca1564ec BP |
271 | <p> |
272 | The customary name for the integration bridge is <code>br-int</code>, but | |
273 | another name may be used. | |
274 | </p> | |
275 | ||
747b2a45 BP |
276 | <h2>Logical Networks</h2> |
277 | ||
278 | <p> | |
279 | A <dfn>logical network</dfn> implements the same concepts as physical | |
280 | networks, but they are insulated from the physical network with tunnels or | |
281 | other encapsulations. This allows logical networks to have separate IP and | |
282 | other address spaces that overlap, without conflicting, with those used for | |
283 | physical networks. Logical network topologies can be arranged without | |
284 | regard for the topologies of the physical networks on which they run. | |
285 | </p> | |
286 | ||
287 | <p> | |
288 | Logical network concepts in OVN include: | |
289 | </p> | |
290 | ||
291 | <ul> | |
292 | <li> | |
293 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
294 | </li> | |
295 | ||
296 | <li> | |
297 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
298 | switches and routers can be connected into sophisticated topologies. | |
299 | </li> | |
300 | ||
301 | <li> | |
302 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
303 | switch. Logical switches and routers are both implemented as logical | |
304 | datapaths. | |
305 | </li> | |
306 | </ul> | |
307 | ||
ca1564ec | 308 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
309 | |
310 | <p> | |
311 | Tables and their schemas presented in isolation are difficult to | |
312 | understand. Here's an example. | |
313 | </p> | |
314 | ||
9fb4636f GS |
315 | <p> |
316 | A VIF on a hypervisor is a virtual network interface attached either | |
317 | to a VM or a container running directly on that hypervisor (This is | |
318 | different from the interface of a container running inside a VM). | |
319 | </p> | |
320 | ||
fe36184b BP |
321 | <p> |
322 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 323 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
324 | <code>ovn-nb</code>(5), respectively, for the full story on these |
325 | databases. | |
326 | </p> | |
327 | ||
328 | <ol> | |
329 | <li> | |
330 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
331 | using the CMS user interface or API and adds it to a switch (one | |
332 | implemented by OVN as a logical switch). The CMS updates its own | |
333 | configuration. This includes associating unique, persistent identifier | |
334 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
335 | </li> | |
336 | ||
337 | <li> | |
338 | The CMS plugin updates the OVN Northbound database to include the new | |
339 | VIF, by adding a row to the <code>Logical_Port</code> table. In the new | |
340 | row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is | |
341 | <var>mac</var>, <code>switch</code> points to the OVN logical switch's | |
342 | Logical_Switch record, and other columns are initialized appropriately. | |
343 | </li> | |
344 | ||
345 | <li> | |
91ae2065 | 346 | <code>ovn-northd</code> receives the OVN Northbound database update. |
ec78987f JP |
347 | In turn, it makes the corresponding updates to the OVN Southbound |
348 | database, by adding rows to the OVN Southbound database | |
349 | <code>Pipeline</code> table to reflect the new port, e.g. add a | |
350 | flow to recognize that packets destined to the new port's MAC | |
351 | address should be delivered to it, and update the flow that | |
352 | delivers broadcast and multicast packets to include the new port. | |
e387e3e8 | 353 | It also creates a record in the <code>Binding</code> table and |
ec78987f | 354 | populates all its columns except the column that identifies the |
9fb4636f | 355 | <code>chassis</code>. |
fe36184b BP |
356 | </li> |
357 | ||
358 | <li> | |
359 | On every hypervisor, <code>ovn-controller</code> receives the | |
91ae2065 RB |
360 | <code>Pipeline</code> table updates that <code>ovn-northd</code> made |
361 | in the previous step. As long as the VM that owns the VIF is powered off, | |
fe36184b BP |
362 | <code>ovn-controller</code> cannot do much; it cannot, for example, |
363 | arrange to send packets to or receive packets from the VIF, because the | |
364 | VIF does not actually exist anywhere. | |
365 | </li> | |
366 | ||
367 | <li> | |
368 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
369 | where the VM is powered on, the integration between the hypervisor and | |
370 | Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF | |
371 | to the OVN integration bridge and stores <var>vif-id</var> in | |
372 | <code>external-ids</code>:<code>iface-id</code> to indicate that the | |
373 | interface is an instantiation of the new VIF. (None of this code is new | |
374 | in OVN; this is pre-existing integration work that has already been done | |
375 | on hypervisors that support OVS.) | |
376 | </li> | |
377 | ||
378 | <li> | |
379 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
380 | notices <code>external-ids</code>:<code>iface-id</code> in the new | |
381 | Interface. In response, it updates the local hypervisor's OpenFlow | |
382 | tables so that packets to and from the VIF are properly handled. | |
a0149f47 | 383 | Afterward, in the OVN Southbound DB, it updates the |
e387e3e8 | 384 | <code>Binding</code> table's <code>chassis</code> column for the |
a0149f47 JP |
385 | row that links the logical port from |
386 | <code>external-ids</code>:<code>iface-id</code> to the hypervisor. | |
fe36184b BP |
387 | </li> |
388 | ||
389 | <li> | |
390 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
391 | networking is ready. To support this, <code>ovn-northd</code> notices |
392 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 393 | <code>Binding</code> table and pushes this upward by updating the |
91ae2065 RB |
394 | <ref column="up" table="Logical_Port" db="OVN_NB"/> column in the OVN |
395 | Northbound database's <ref table="Logical_Port" db="OVN_NB"/> table to | |
396 | indicate that the VIF is now up. The CMS, if it uses this feature, can | |
397 | then | |
9fb4636f | 398 | react by allowing the VM's execution to proceed. |
fe36184b BP |
399 | </li> |
400 | ||
401 | <li> | |
402 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 403 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 404 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
405 | the physical location of the logical port, so each instance updates the |
406 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
407 | DB <code>Pipeline</code> table) so that packets to and from the VIF can | |
408 | be properly handled via tunnels. | |
409 | </li> | |
410 | ||
411 | <li> | |
412 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 413 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
414 | integration bridge. |
415 | </li> | |
416 | ||
417 | <li> | |
6eceebf5 | 418 | On the hypervisor where the VM was powered off, |
fe36184b | 419 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 420 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 421 | <code>Binding</code> table for the logical port. |
fe36184b BP |
422 | </li> |
423 | ||
424 | <li> | |
9fb4636f | 425 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 426 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
427 | for the logical port. This means that <code>ovn-controller</code> no |
428 | longer knows the physical location of the logical port, so each instance | |
429 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
430 | </li> |
431 | ||
432 | <li> | |
433 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
434 | anyone, an administrator deletes the VIF using the CMS user interface or | |
435 | API. The CMS updates its own configuration. | |
436 | </li> | |
437 | ||
438 | <li> | |
439 | The CMS plugin removes the VIF from the OVN Northbound database, | |
440 | by deleting its row in the <code>Logical_Port</code> table. | |
441 | </li> | |
442 | ||
443 | <li> | |
91ae2065 | 444 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
ec78987f JP |
445 | updates the OVN Southbound database accordingly, by removing or |
446 | updating the rows from the OVN Southbound database | |
e387e3e8 | 447 | <code>Pipeline</code> table and <code>Binding</code> table that |
ec78987f | 448 | were related to the now-destroyed VIF. |
fe36184b BP |
449 | </li> |
450 | ||
451 | <li> | |
452 | On every hypervisor, <code>ovn-controller</code> receives the | |
91ae2065 RB |
453 | <code>Pipeline</code> table updates that <code>ovn-northd</code> made |
454 | in the previous step. <code>ovn-controller</code> updates OpenFlow tables | |
455 | to reflect the update, although there may not be much to do, since the VIF | |
fe36184b | 456 | had already become unreachable when it was removed from the |
e387e3e8 | 457 | <code>Binding</code> table in a previous step. |
fe36184b BP |
458 | </li> |
459 | </ol> | |
460 | ||
9fb4636f GS |
461 | <h2>Life Cycle of a container interface inside a VM</h2> |
462 | ||
463 | <p> | |
464 | OVN provides virtual network abstractions by converting information | |
465 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
466 | virtual networking for multi-tenants can only be provided if OVN controller | |
467 | is the only entity that can modify flows in Open vSwitch. When the | |
468 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
469 | fair assumption to make that tenant workloads running inside VMs cannot | |
470 | make any changes to Open vSwitch flows. | |
471 | </p> | |
472 | ||
473 | <p> | |
474 | If the infrastructure provider trusts the applications inside the | |
475 | containers not to break out and modify the Open vSwitch flows, then | |
476 | containers can be run in hypervisors. This is also the case when | |
477 | containers are run inside the VMs and Open vSwitch integration bridge | |
478 | with flows added by OVN controller resides in the same VM. For both | |
479 | the above cases, the workflow is the same as explained with an example | |
480 | in the previous section ("Life Cycle of a VIF"). | |
481 | </p> | |
482 | ||
483 | <p> | |
484 | This section talks about the life cycle of a container interface (CIF) | |
485 | when containers are created in the VMs and the Open vSwitch integration | |
486 | bridge resides inside the hypervisor. In this case, even if a container | |
487 | application breaks out, other tenants are not affected because the | |
488 | containers running inside the VMs cannot modify the flows in the | |
489 | Open vSwitch integration bridge. | |
490 | </p> | |
491 | ||
492 | <p> | |
493 | When multiple containers are created inside a VM, there are multiple | |
494 | CIFs associated with them. The network traffic associated with these | |
495 | CIFs need to reach the Open vSwitch integration bridge running in the | |
496 | hypervisor for OVN to support virtual network abstractions. OVN should | |
497 | also be able to distinguish network traffic coming from different CIFs. | |
498 | There are two ways to distinguish network traffic of CIFs. | |
499 | </p> | |
500 | ||
501 | <p> | |
502 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
503 | there could be a lot of network devices in the hypervisor. This would slow | |
504 | down OVS because of all the additional CPU cycles needed for the management | |
505 | of all the VIFs. It would also mean that the entity creating the | |
506 | containers in a VM should also be able to create the corresponding VIFs in | |
507 | the hypervisor. | |
508 | </p> | |
509 | ||
510 | <p> | |
511 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
512 | OVN could then distinguish network traffic coming from different CIFs via | |
513 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
514 | the tagging mechanism. | |
515 | </p> | |
516 | ||
517 | <ol> | |
518 | <li> | |
519 | A CIF's life cycle begins when a container is spawned inside a VM by | |
520 | the either the same CMS that created the VM or a tenant that owns that VM | |
521 | or even a container Orchestration System that is different than the CMS | |
522 | that initially created the VM. Whoever the entity is, it will need to | |
523 | know the <var>vif-id</var> that is associated with the network interface | |
524 | of the VM through which the container interface's network traffic is | |
525 | expected to go through. The entity that creates the container interface | |
526 | will also need to choose an unused VLAN inside that VM. | |
527 | </li> | |
528 | ||
529 | <li> | |
530 | The container spawning entity (either directly or through the CMS that | |
531 | manages the underlying infrastructure) updates the OVN Northbound | |
532 | database to include the new CIF, by adding a row to the | |
533 | <code>Logical_Port</code> table. In the new row, <code>name</code> is | |
534 | any unique identifier, <code>parent_name</code> is the <var>vif-id</var> | |
535 | of the VM through which the CIF's network traffic is expected to go | |
536 | through and the <code>tag</code> is the VLAN tag that identifies the | |
537 | network traffic of that CIF. | |
538 | </li> | |
539 | ||
540 | <li> | |
91ae2065 RB |
541 | <code>ovn-northd</code> receives the OVN Northbound database update. |
542 | In turn, it makes the corresponding updates to the OVN Southbound | |
ec78987f JP |
543 | database, by adding rows to the OVN Southbound database's |
544 | <code>Pipeline</code> table to reflect the new port and also by | |
e387e3e8 | 545 | creating a new row in the <code>Binding</code> table and |
ec78987f | 546 | populating all its columns except the column that identifies the |
9fb4636f GS |
547 | <code>chassis</code>. |
548 | </li> | |
549 | ||
550 | <li> | |
551 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 552 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 553 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 554 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
555 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
556 | has that same value in <var>vif-id</var> in | |
557 | <code>external-ids</code>:<code>iface-id</code> | |
9fb4636f GS |
558 | updates the local hypervisor's OpenFlow tables so that packets to and |
559 | from the VIF with the particular VLAN <code>tag</code> are properly | |
560 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 561 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
562 | </li> |
563 | ||
564 | <li> | |
565 | One can only start the application inside the container after the | |
91ae2065 | 566 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 567 | notices the updated <code>chassis</code> column in <code>Binding</code> |
9fb4636f GS |
568 | table and updates the <ref column="up" table="Logical_Port" |
569 | db="OVN_NB"/> column in the OVN Northbound database's | |
570 | <ref table="Logical_Port" db="OVN_NB"/> table to indicate that the | |
571 | CIF is now up. The entity responsible to start the container application | |
572 | queries this value and starts the application. | |
573 | </li> | |
574 | ||
575 | <li> | |
576 | Eventually the entity that created and started the container, stops it. | |
577 | The entity, through the CMS (or directly) deletes its row in the | |
578 | <code>Logical_Port</code> table. | |
579 | </li> | |
580 | ||
581 | <li> | |
91ae2065 | 582 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
ec78987f JP |
583 | updates the OVN Southbound database accordingly, by removing or |
584 | updating the rows from the OVN Southbound database | |
585 | <code>Pipeline</code> table that were related to the now-destroyed | |
e387e3e8 | 586 | CIF. It also deletes the row in the <code>Binding</code> table |
ec78987f | 587 | for that CIF. |
9fb4636f GS |
588 | </li> |
589 | ||
590 | <li> | |
591 | On every hypervisor, <code>ovn-controller</code> receives the | |
91ae2065 RB |
592 | <code>Pipeline</code> table updates that <code>ovn-northd</code> made |
593 | in the previous step. <code>ovn-controller</code> updates OpenFlow tables | |
594 | to reflect the update. | |
9fb4636f GS |
595 | </li> |
596 | </ol> | |
b705f9ea JP |
597 | |
598 | <h1>Design Decisions</h1> | |
599 | ||
600 | <h2>Supported Tunnel Encapsulations</h2> | |
601 | <p> | |
602 | For connecting hypervisors to each other, the only supported tunnel | |
603 | encapsulations are Geneve and STT. Hypervisors may use VXLAN to | |
604 | connect to gateways. We have limited support to these encapsulations | |
605 | for the following reasons: | |
606 | </p> | |
607 | ||
608 | <ul> | |
609 | <li> | |
610 | <p> | |
611 | They support large amounts of metadata. In addition to | |
612 | specifying the logical switch, we will likely want to indicate | |
613 | the logical source port and where we are in the logical | |
614 | pipeline. Geneve supports a 24-bit VNI field and TLV-based | |
615 | extensions. The header of STT includes a 64-bit context id. | |
616 | </p> | |
617 | </li> | |
618 | ||
619 | <li> | |
620 | <p> | |
621 | They use randomized UDP or TCP source ports that allows | |
622 | efficient distribution among multiple paths in environments that | |
623 | use ECMP in their underlay. | |
624 | </p> | |
625 | </li> | |
626 | ||
627 | <li> | |
628 | <p> | |
629 | NICs are available that accelerate encapsulation and decapsulation. | |
630 | </p> | |
631 | </li> | |
632 | </ul> | |
633 | ||
634 | <p> | |
635 | Due to its flexibility, the preferred encapsulation between | |
636 | hypervisors is Geneve. Some environments may want to use STT for | |
637 | performance reasons until the NICs they use support hardware offload | |
638 | of Geneve. | |
639 | </p> | |
640 | ||
641 | <p> | |
642 | For connecting to gateways, the only supported tunnel encapsulations | |
643 | are VXLAN, Geneve, and STT. While support for Geneve is becoming | |
644 | available for TOR (top-of-rack) switches, VXLAN is far more common. | |
645 | Currently, gateways have a feature set that matches the capabilities | |
646 | as defined by the VTEP schema, so fewer bits of metadata are | |
647 | necessary. In the future, gateways that do not support | |
648 | encapsulations with large amounts of metadata may continue to have a | |
649 | reduced feature set. | |
650 | </p> | |
fe36184b | 651 | </manpage> |