]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
45 | <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor | |
46 | platform supported by Open vSwitch is acceptable. | |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
56 | switch that supports the <code>vtep</code>(5) schema. (Support for the | |
57 | latter will come later in OVN implementation.) | |
fe36184b BP |
58 | </p> |
59 | ||
60 | <p> | |
fa6aeaeb RB |
61 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
62 | or <dfn>chassis</dfn>. | |
fe36184b BP |
63 | </p> |
64 | </li> | |
65 | </ul> | |
66 | ||
67 | <p> | |
68 | The diagram below shows how the major components of OVN and related | |
69 | software interact. Starting at the top of the diagram, we have: | |
70 | </p> | |
71 | ||
72 | <ul> | |
73 | <li> | |
74 | The Cloud Management System, as defined above. | |
75 | </li> | |
76 | ||
77 | <li> | |
78 | <p> | |
fa6aeaeb RB |
79 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
80 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
81 | The plugin's main purpose is to translate the CMS's notion of logical | |
82 | network configuration, stored in the CMS's configuration database in a | |
83 | CMS-specific format, into an intermediate representation understood by | |
84 | OVN. | |
fe36184b BP |
85 | </p> |
86 | ||
87 | <p> | |
fa6aeaeb RB |
88 | This component is necessarily CMS-specific, so a new plugin needs to be |
89 | developed for each CMS that is integrated with OVN. All of the | |
90 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
91 | </p> |
92 | </li> | |
93 | ||
94 | <li> | |
95 | <p> | |
fa6aeaeb RB |
96 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
97 | representation of logical network configuration passed down by the | |
98 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
99 | matched'' with the concepts used in a CMS, so that it directly supports | |
100 | notions of logical switches, routers, ACLs, and so on. See | |
5868eb24 | 101 | <code>ovn-nb</code>(5) for details. |
fe36184b BP |
102 | </p> |
103 | ||
104 | <p> | |
fa6aeaeb RB |
105 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
106 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
107 | </p> |
108 | </li> | |
109 | ||
110 | <li> | |
91ae2065 RB |
111 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
112 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
113 | logical network configuration in terms of conventional network |
114 | concepts, taken from the OVN Northbound Database, into logical | |
115 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
116 | </li> |
117 | ||
118 | <li> | |
119 | <p> | |
ec78987f | 120 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 121 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 122 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
123 | </p> |
124 | ||
125 | <p> | |
fa6aeaeb RB |
126 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
127 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
128 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
129 | logical network in terms of ``logical datapath flows,'' and | |
130 | <dfn>Binding</dfn> tables that link logical network components' | |
131 | locations to the physical network. The hypervisors populate the PN and | |
dcda6e0d BP |
132 | Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the |
133 | LN tables. | |
fe36184b BP |
134 | </p> |
135 | ||
136 | <p> | |
ec78987f JP |
137 | OVN Southbound Database performance must scale with the number of |
138 | transport nodes. This will likely require some work on | |
139 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
140 | Clustering for availability may be needed. | |
fe36184b BP |
141 | </p> |
142 | </li> | |
143 | </ul> | |
144 | ||
145 | <p> | |
146 | The remaining components are replicated onto each hypervisor: | |
147 | </p> | |
148 | ||
149 | <ul> | |
150 | <li> | |
151 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
152 | software gateway. Northbound, it connects to the OVN Southbound |
153 | Database to learn about OVN configuration and status and to | |
154 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 155 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
156 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
157 | OpenFlow controller, for control over network traffic, and to the | |
158 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
159 | control Open vSwitch configuration. | |
fe36184b BP |
160 | </li> |
161 | ||
162 | <li> | |
163 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
164 | conventional components of Open vSwitch. | |
165 | </li> | |
166 | </ul> | |
167 | ||
168 | <pre fixed="yes"> | |
169 | CMS | |
170 | | | |
171 | | | |
172 | +-----------|-----------+ | |
173 | | | | | |
174 | | OVN/CMS Plugin | | |
175 | | | | | |
176 | | | | | |
177 | | OVN Northbound DB | | |
178 | | | | | |
179 | | | | | |
91ae2065 | 180 | | ovn-northd | |
fe36184b BP |
181 | | | | |
182 | +-----------|-----------+ | |
183 | | | |
184 | | | |
ec78987f JP |
185 | +-------------------+ |
186 | | OVN Southbound DB | | |
187 | +-------------------+ | |
fe36184b BP |
188 | | |
189 | | | |
190 | +------------------+------------------+ | |
191 | | | | | |
ec78987f | 192 | HV 1 | | HV n | |
fe36184b BP |
193 | +---------------|---------------+ . +---------------|---------------+ |
194 | | | | . | | | | |
195 | | ovn-controller | . | ovn-controller | | |
196 | | | | | . | | | | | |
197 | | | | | | | | | | |
198 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
199 | | | | | | |
200 | +-------------------------------+ +-------------------------------+ | |
201 | </pre> | |
202 | ||
ca1564ec BP |
203 | <h2>Chassis Setup</h2> |
204 | ||
205 | <p> | |
206 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
207 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
208 | System startup scripts create this bridge prior to starting | |
209 | <code>ovn-controller</code>. The ports on the integration bridge include: | |
210 | </p> | |
211 | ||
212 | <ul> | |
213 | <li> | |
214 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
215 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
216 | these tunnel ports. | |
217 | </li> | |
218 | ||
219 | <li> | |
220 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
221 | The hypervisor itself, or the integration between Open vSwitch and the | |
222 | hypervisor (described in <code>IntegrationGuide.md</code>) takes care of | |
223 | this. (This is not part of OVN or new to OVN; this is pre-existing | |
224 | integration work that has already been done on hypervisors that support | |
225 | OVS.) | |
226 | </li> | |
227 | ||
228 | <li> | |
229 | On a gateway, the physical port used for logical network connectivity. | |
230 | System startup scripts add this port to the bridge prior to starting | |
231 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
232 | instead of a physical port, in more sophisticated setups. | |
233 | </li> | |
234 | </ul> | |
235 | ||
236 | <p> | |
237 | Other ports should not be attached to the integration bridge. In | |
238 | particular, physical ports attached to the underlay network (as opposed to | |
239 | gateway ports, which are physical ports attached to logical networks) must | |
240 | not be attached to the integration bridge. Underlay physical ports should | |
241 | instead be attached to a separate Open vSwitch bridge (they need not be | |
242 | attached to any bridge at all, in fact). | |
243 | </p> | |
244 | ||
245 | <p> | |
a42226f0 BP |
246 | The integration bridge should be configured as described below. |
247 | The effect of each of these settings is documented in | |
248 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
249 | </p> |
250 | ||
a42226f0 BP |
251 | <dl> |
252 | <dt><code>fail-mode=secure</code></dt> | |
253 | <dd> | |
254 | Avoids switching packets between isolated logical networks before | |
255 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
256 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
257 | </dd> | |
258 | ||
259 | <dt><code>other-config:disable-in-band=true</code></dt> | |
260 | <dd> | |
261 | Suppresses in-band control flows for the integration bridge. It would be | |
262 | unusual for such flows to show up anyway, because OVN uses a local | |
263 | controller (over a Unix domain socket) instead of a remote controller. | |
264 | It's possible, however, for some other bridge in the same system to have | |
265 | an in-band remote controller, and in that case this suppresses the flows | |
266 | that in-band control would ordinarily set up. See <code>In-Band | |
267 | Control</code> in <code>DESIGN.md</code> for more information. | |
268 | </dd> | |
269 | </dl> | |
270 | ||
ca1564ec BP |
271 | <p> |
272 | The customary name for the integration bridge is <code>br-int</code>, but | |
273 | another name may be used. | |
274 | </p> | |
275 | ||
747b2a45 BP |
276 | <h2>Logical Networks</h2> |
277 | ||
278 | <p> | |
279 | A <dfn>logical network</dfn> implements the same concepts as physical | |
280 | networks, but they are insulated from the physical network with tunnels or | |
281 | other encapsulations. This allows logical networks to have separate IP and | |
282 | other address spaces that overlap, without conflicting, with those used for | |
283 | physical networks. Logical network topologies can be arranged without | |
284 | regard for the topologies of the physical networks on which they run. | |
285 | </p> | |
286 | ||
287 | <p> | |
288 | Logical network concepts in OVN include: | |
289 | </p> | |
290 | ||
291 | <ul> | |
292 | <li> | |
293 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
294 | </li> | |
295 | ||
296 | <li> | |
297 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
298 | switches and routers can be connected into sophisticated topologies. | |
299 | </li> | |
300 | ||
301 | <li> | |
302 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
303 | switch. Logical switches and routers are both implemented as logical | |
304 | datapaths. | |
305 | </li> | |
306 | </ul> | |
307 | ||
ca1564ec | 308 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
309 | |
310 | <p> | |
311 | Tables and their schemas presented in isolation are difficult to | |
312 | understand. Here's an example. | |
313 | </p> | |
314 | ||
9fb4636f GS |
315 | <p> |
316 | A VIF on a hypervisor is a virtual network interface attached either | |
317 | to a VM or a container running directly on that hypervisor (This is | |
318 | different from the interface of a container running inside a VM). | |
319 | </p> | |
320 | ||
fe36184b BP |
321 | <p> |
322 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 323 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
324 | <code>ovn-nb</code>(5), respectively, for the full story on these |
325 | databases. | |
326 | </p> | |
327 | ||
328 | <ol> | |
329 | <li> | |
330 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
331 | using the CMS user interface or API and adds it to a switch (one | |
332 | implemented by OVN as a logical switch). The CMS updates its own | |
333 | configuration. This includes associating unique, persistent identifier | |
334 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
335 | </li> | |
336 | ||
337 | <li> | |
338 | The CMS plugin updates the OVN Northbound database to include the new | |
339 | VIF, by adding a row to the <code>Logical_Port</code> table. In the new | |
340 | row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is | |
341 | <var>mac</var>, <code>switch</code> points to the OVN logical switch's | |
342 | Logical_Switch record, and other columns are initialized appropriately. | |
343 | </li> | |
344 | ||
345 | <li> | |
5868eb24 BP |
346 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
347 | turn, it makes the corresponding updates to the OVN Southbound database, | |
348 | by adding rows to the OVN Southbound database <code>Logical_Flow</code> | |
349 | table to reflect the new port, e.g. add a flow to recognize that packets | |
350 | destined to the new port's MAC address should be delivered to it, and | |
351 | update the flow that delivers broadcast and multicast packets to include | |
352 | the new port. It also creates a record in the <code>Binding</code> table | |
353 | and populates all its columns except the column that identifies the | |
9fb4636f | 354 | <code>chassis</code>. |
fe36184b BP |
355 | </li> |
356 | ||
357 | <li> | |
358 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 359 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
360 | in the previous step. As long as the VM that owns the VIF is powered |
361 | off, <code>ovn-controller</code> cannot do much; it cannot, for example, | |
fe36184b BP |
362 | arrange to send packets to or receive packets from the VIF, because the |
363 | VIF does not actually exist anywhere. | |
364 | </li> | |
365 | ||
366 | <li> | |
367 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
368 | where the VM is powered on, the integration between the hypervisor and | |
369 | Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF | |
370 | to the OVN integration bridge and stores <var>vif-id</var> in | |
371 | <code>external-ids</code>:<code>iface-id</code> to indicate that the | |
372 | interface is an instantiation of the new VIF. (None of this code is new | |
373 | in OVN; this is pre-existing integration work that has already been done | |
374 | on hypervisors that support OVS.) | |
375 | </li> | |
376 | ||
377 | <li> | |
378 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
379 | notices <code>external-ids</code>:<code>iface-id</code> in the new | |
380 | Interface. In response, it updates the local hypervisor's OpenFlow | |
381 | tables so that packets to and from the VIF are properly handled. | |
a0149f47 | 382 | Afterward, in the OVN Southbound DB, it updates the |
e387e3e8 | 383 | <code>Binding</code> table's <code>chassis</code> column for the |
a0149f47 JP |
384 | row that links the logical port from |
385 | <code>external-ids</code>:<code>iface-id</code> to the hypervisor. | |
fe36184b BP |
386 | </li> |
387 | ||
388 | <li> | |
389 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
390 | networking is ready. To support this, <code>ovn-northd</code> notices |
391 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 392 | <code>Binding</code> table and pushes this upward by updating the |
91ae2065 RB |
393 | <ref column="up" table="Logical_Port" db="OVN_NB"/> column in the OVN |
394 | Northbound database's <ref table="Logical_Port" db="OVN_NB"/> table to | |
395 | indicate that the VIF is now up. The CMS, if it uses this feature, can | |
396 | then | |
9fb4636f | 397 | react by allowing the VM's execution to proceed. |
fe36184b BP |
398 | </li> |
399 | ||
400 | <li> | |
401 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 402 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 403 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
404 | the physical location of the logical port, so each instance updates the |
405 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
5868eb24 BP |
406 | DB <code>Logical_Flow</code> table) so that packets to and from the VIF |
407 | can be properly handled via tunnels. | |
fe36184b BP |
408 | </li> |
409 | ||
410 | <li> | |
411 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 412 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
413 | integration bridge. |
414 | </li> | |
415 | ||
416 | <li> | |
6eceebf5 | 417 | On the hypervisor where the VM was powered off, |
fe36184b | 418 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 419 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 420 | <code>Binding</code> table for the logical port. |
fe36184b BP |
421 | </li> |
422 | ||
423 | <li> | |
9fb4636f | 424 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 425 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
426 | for the logical port. This means that <code>ovn-controller</code> no |
427 | longer knows the physical location of the logical port, so each instance | |
428 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
429 | </li> |
430 | ||
431 | <li> | |
432 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
433 | anyone, an administrator deletes the VIF using the CMS user interface or | |
434 | API. The CMS updates its own configuration. | |
435 | </li> | |
436 | ||
437 | <li> | |
438 | The CMS plugin removes the VIF from the OVN Northbound database, | |
439 | by deleting its row in the <code>Logical_Port</code> table. | |
440 | </li> | |
441 | ||
442 | <li> | |
91ae2065 | 443 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
444 | updates the OVN Southbound database accordingly, by removing or updating |
445 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
446 | and <code>Binding</code> table that were related to the now-destroyed | |
447 | VIF. | |
fe36184b BP |
448 | </li> |
449 | ||
450 | <li> | |
451 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 452 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
453 | in the previous step. <code>ovn-controller</code> updates OpenFlow |
454 | tables to reflect the update, although there may not be much to do, since | |
455 | the VIF had already become unreachable when it was removed from the | |
e387e3e8 | 456 | <code>Binding</code> table in a previous step. |
fe36184b BP |
457 | </li> |
458 | </ol> | |
459 | ||
9fb4636f GS |
460 | <h2>Life Cycle of a container interface inside a VM</h2> |
461 | ||
462 | <p> | |
463 | OVN provides virtual network abstractions by converting information | |
464 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
465 | virtual networking for multi-tenants can only be provided if OVN controller | |
466 | is the only entity that can modify flows in Open vSwitch. When the | |
467 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
468 | fair assumption to make that tenant workloads running inside VMs cannot | |
469 | make any changes to Open vSwitch flows. | |
470 | </p> | |
471 | ||
472 | <p> | |
473 | If the infrastructure provider trusts the applications inside the | |
474 | containers not to break out and modify the Open vSwitch flows, then | |
475 | containers can be run in hypervisors. This is also the case when | |
476 | containers are run inside the VMs and Open vSwitch integration bridge | |
477 | with flows added by OVN controller resides in the same VM. For both | |
478 | the above cases, the workflow is the same as explained with an example | |
479 | in the previous section ("Life Cycle of a VIF"). | |
480 | </p> | |
481 | ||
482 | <p> | |
483 | This section talks about the life cycle of a container interface (CIF) | |
484 | when containers are created in the VMs and the Open vSwitch integration | |
485 | bridge resides inside the hypervisor. In this case, even if a container | |
486 | application breaks out, other tenants are not affected because the | |
487 | containers running inside the VMs cannot modify the flows in the | |
488 | Open vSwitch integration bridge. | |
489 | </p> | |
490 | ||
491 | <p> | |
492 | When multiple containers are created inside a VM, there are multiple | |
493 | CIFs associated with them. The network traffic associated with these | |
494 | CIFs need to reach the Open vSwitch integration bridge running in the | |
495 | hypervisor for OVN to support virtual network abstractions. OVN should | |
496 | also be able to distinguish network traffic coming from different CIFs. | |
497 | There are two ways to distinguish network traffic of CIFs. | |
498 | </p> | |
499 | ||
500 | <p> | |
501 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
502 | there could be a lot of network devices in the hypervisor. This would slow | |
503 | down OVS because of all the additional CPU cycles needed for the management | |
504 | of all the VIFs. It would also mean that the entity creating the | |
505 | containers in a VM should also be able to create the corresponding VIFs in | |
506 | the hypervisor. | |
507 | </p> | |
508 | ||
509 | <p> | |
510 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
511 | OVN could then distinguish network traffic coming from different CIFs via | |
512 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
513 | the tagging mechanism. | |
514 | </p> | |
515 | ||
516 | <ol> | |
517 | <li> | |
518 | A CIF's life cycle begins when a container is spawned inside a VM by | |
519 | the either the same CMS that created the VM or a tenant that owns that VM | |
520 | or even a container Orchestration System that is different than the CMS | |
521 | that initially created the VM. Whoever the entity is, it will need to | |
522 | know the <var>vif-id</var> that is associated with the network interface | |
523 | of the VM through which the container interface's network traffic is | |
524 | expected to go through. The entity that creates the container interface | |
525 | will also need to choose an unused VLAN inside that VM. | |
526 | </li> | |
527 | ||
528 | <li> | |
529 | The container spawning entity (either directly or through the CMS that | |
530 | manages the underlying infrastructure) updates the OVN Northbound | |
531 | database to include the new CIF, by adding a row to the | |
532 | <code>Logical_Port</code> table. In the new row, <code>name</code> is | |
533 | any unique identifier, <code>parent_name</code> is the <var>vif-id</var> | |
534 | of the VM through which the CIF's network traffic is expected to go | |
535 | through and the <code>tag</code> is the VLAN tag that identifies the | |
536 | network traffic of that CIF. | |
537 | </li> | |
538 | ||
539 | <li> | |
5868eb24 BP |
540 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
541 | turn, it makes the corresponding updates to the OVN Southbound database, | |
542 | by adding rows to the OVN Southbound database's <code>Logical_Flow</code> | |
543 | table to reflect the new port and also by creating a new row in the | |
544 | <code>Binding</code> table and populating all its columns except the | |
545 | column that identifies the <code>chassis</code>. | |
9fb4636f GS |
546 | </li> |
547 | ||
548 | <li> | |
549 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 550 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 551 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 552 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
553 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
554 | has that same value in <var>vif-id</var> in | |
555 | <code>external-ids</code>:<code>iface-id</code> | |
9fb4636f GS |
556 | updates the local hypervisor's OpenFlow tables so that packets to and |
557 | from the VIF with the particular VLAN <code>tag</code> are properly | |
558 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 559 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
560 | </li> |
561 | ||
562 | <li> | |
563 | One can only start the application inside the container after the | |
91ae2065 | 564 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 565 | notices the updated <code>chassis</code> column in <code>Binding</code> |
9fb4636f GS |
566 | table and updates the <ref column="up" table="Logical_Port" |
567 | db="OVN_NB"/> column in the OVN Northbound database's | |
568 | <ref table="Logical_Port" db="OVN_NB"/> table to indicate that the | |
569 | CIF is now up. The entity responsible to start the container application | |
570 | queries this value and starts the application. | |
571 | </li> | |
572 | ||
573 | <li> | |
574 | Eventually the entity that created and started the container, stops it. | |
575 | The entity, through the CMS (or directly) deletes its row in the | |
576 | <code>Logical_Port</code> table. | |
577 | </li> | |
578 | ||
579 | <li> | |
91ae2065 | 580 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
581 | updates the OVN Southbound database accordingly, by removing or updating |
582 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
583 | that were related to the now-destroyed CIF. It also deletes the row in | |
584 | the <code>Binding</code> table for that CIF. | |
9fb4636f GS |
585 | </li> |
586 | ||
587 | <li> | |
588 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 BP |
589 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
590 | in the previous step. <code>ovn-controller</code> updates OpenFlow | |
591 | tables to reflect the update. | |
9fb4636f GS |
592 | </li> |
593 | </ol> | |
b705f9ea | 594 | |
5868eb24 | 595 | <h2>Life Cycle of a Packet</h2> |
b705f9ea | 596 | |
b705f9ea | 597 | <p> |
5868eb24 BP |
598 | This section describes how a packet travels from one virtual machine or |
599 | container to another through OVN. This description focuses on the physical | |
600 | treatment of a packet; for a description of the logical life cycle of a | |
601 | packet, please refer to the <code>Logical_Flow</code> table in | |
602 | <code>ovn-sb</code>(5). | |
b705f9ea JP |
603 | </p> |
604 | ||
5868eb24 BP |
605 | <p> |
606 | This section mentions several data and metadata fields, for clarity | |
607 | summarized here: | |
608 | </p> | |
609 | ||
610 | <dl> | |
611 | <dt>tunnel key</dt> | |
612 | <dd> | |
613 | When OVN encapsulates a packet in Geneve or another tunnel, it attaches | |
614 | extra data to it to allow the receiving OVN instance to process it | |
615 | correctly. This takes different forms depending on the particular | |
616 | encapsulation, but in each case we refer to it here as the ``tunnel | |
617 | key.'' See <code>Tunnel Encapsulations</code>, below, for details. | |
618 | </dd> | |
619 | ||
620 | <dt>logical datapath field</dt> | |
621 | <dd> | |
622 | A field that denotes the logical datapath through which a packet is being | |
623 | processed. OVN uses the field that OpenFlow 1.1+ simply (and | |
624 | confusingly) calls ``metadata'' to store the logical datapath. (This | |
625 | field is passed across tunnels as part of the tunnel key.) | |
626 | </dd> | |
627 | ||
628 | <dt>logical input port field</dt> | |
629 | <dd> | |
cd144a41 JP |
630 | A field that denotes the logical port from which the packet |
631 | entered the logical datapath. OVN stores this in Nicira extension | |
632 | register number 6. (This field is passed across tunnels as part | |
633 | of the tunnel key.) | |
5868eb24 BP |
634 | </dd> |
635 | ||
636 | <dt>logical output port field</dt> | |
637 | <dd> | |
cd144a41 JP |
638 | A field that denotes the logical port from which the packet will |
639 | leave the logical datapath. This is initialized to 0 at the | |
640 | beginning of the logical ingress pipeline. OVN stores this in | |
641 | Nicira extension register number 7. (This field is passed across | |
642 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
643 | </dd> |
644 | ||
645 | <dt>VLAN ID</dt> | |
646 | <dd> | |
647 | The VLAN ID is used as an interface between OVN and containers nested | |
648 | inside a VM (see <code>Life Cycle of a container interface inside a | |
649 | VM</code>, above, for more information). | |
650 | </dd> | |
651 | </dl> | |
652 | ||
653 | <p> | |
654 | Initially, a VM or container on the ingress hypervisor sends a packet on a | |
655 | port attached to the OVN integration bridge. Then: | |
656 | </p> | |
657 | ||
658 | <ol> | |
b705f9ea JP |
659 | <li> |
660 | <p> | |
5868eb24 BP |
661 | OpenFlow table 0 performs physical-to-logical translation. It matches |
662 | the packet's ingress port. Its actions annotate the packet with | |
663 | logical metadata, by setting the logical datapath field to identify the | |
664 | logical datapath that the packet is traversing and the logical input | |
665 | port field to identify the ingress port. Then it resubmits to table 16 | |
666 | to enter the logical ingress pipeline. | |
667 | </p> | |
668 | ||
669 | <p> | |
670 | Packets that originate from a container nested within a VM are treated | |
671 | in a slightly different way. The originating container can be | |
672 | distinguished based on the VIF-specific VLAN ID, so the | |
673 | physical-to-logical translation flows additionally match on VLAN ID and | |
674 | the actions strip the VLAN header. Following this step, OVN treats | |
675 | packets from containers just like any other packets. | |
676 | </p> | |
677 | ||
678 | <p> | |
679 | Table 0 also processes packets that arrive from other chassis. It | |
680 | distinguishes them from other packets by ingress port, which is a | |
681 | tunnel. As with packets just entering the OVN pipeline, the actions | |
682 | annotate these packets with logical datapath and logical ingress port | |
683 | metadata. In addition, the actions set the logical output port field, | |
684 | which is available because in OVN tunneling occurs after the logical | |
685 | output port is known. These three pieces of information are obtained | |
686 | from the tunnel encapsulation metadata (see <code>Tunnel | |
687 | Encapsulations</code> for encoding details). Then the actions resubmit | |
688 | to table 33 to enter the logical egress pipeline. | |
b705f9ea JP |
689 | </p> |
690 | </li> | |
691 | ||
692 | <li> | |
693 | <p> | |
5868eb24 BP |
694 | OpenFlow tables 16 through 31 execute the logical ingress pipeline from |
695 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
696 | These tables are expressed entirely in terms of logical concepts like | |
697 | logical ports and logical datapaths. A big part of | |
698 | <code>ovn-controller</code>'s job is to translate them into equivalent | |
699 | OpenFlow (in particular it translates the table numbers: | |
700 | <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16 | |
701 | through 31). For a given packet, the logical ingress pipeline | |
702 | eventually executes zero or more <code>output</code> actions: | |
b705f9ea | 703 | </p> |
5868eb24 BP |
704 | |
705 | <ul> | |
706 | <li> | |
707 | If the pipeline executes no <code>output</code> actions at all, the | |
708 | packet is effectively dropped. | |
709 | </li> | |
710 | ||
711 | <li> | |
712 | Most commonly, the pipeline executes one <code>output</code> action, | |
713 | which <code>ovn-controller</code> implements by resubmitting the | |
714 | packet to table 32. | |
715 | </li> | |
716 | ||
717 | <li> | |
718 | If the pipeline can execute more than one <code>output</code> action, | |
719 | then each one is separately resubmitted to table 32. This can be | |
720 | used to send multiple copies of the packet to multiple ports. (If | |
721 | the packet was not modified between the <code>output</code> actions, | |
722 | and some of the copies are destined to the same hypervisor, then | |
723 | using a logical multicast output port would save bandwidth between | |
724 | hypervisors.) | |
725 | </li> | |
726 | </ul> | |
b705f9ea JP |
727 | </li> |
728 | ||
729 | <li> | |
730 | <p> | |
5868eb24 BP |
731 | OpenFlow tables 32 through 47 implement the <code>output</code> action |
732 | in the logical ingress pipeline. Specifically, table 32 handles | |
733 | packets to remote hypervisors, table 33 handles packets to the local | |
734 | hypervisor, and table 34 discards packets whose logical ingress and | |
735 | egress port are the same. | |
736 | </p> | |
737 | ||
738 | <p> | |
739 | Each flow in table 32 matches on a logical output port for unicast or | |
740 | multicast logical ports that include a logical port on a remote | |
741 | hypervisor. Each flow's actions implement sending a packet to the port | |
742 | it matches. For unicast logical output ports on remote hypervisors, | |
743 | the actions set the tunnel key to the correct value, then send the | |
744 | packet on the tunnel port to the correct hypervisor. (When the remote | |
745 | hypervisor receives the packet, table 0 there will recognize it as a | |
746 | tunneled packet and pass it along to table 33.) For multicast logical | |
747 | output ports, the actions send one copy of the packet to each remote | |
748 | hypervisor, in the same way as for unicast destinations. If a | |
749 | multicast group includes a logical port or ports on the local | |
750 | hypervisor, then its actions also resubmit to table 33. Table 32 also | |
751 | includes a fallback flow that resubmits to table 33 if there is no | |
752 | other match. | |
753 | </p> | |
754 | ||
755 | <p> | |
756 | Flows in table 33 resemble those in table 32 but for logical ports that | |
757 | reside locally rather than remotely. For unicast logical output ports | |
758 | on the local hypervisor, the actions just resubmit to table 34. For | |
759 | multicast output ports that include one or more logical ports on the | |
760 | local hypervisor, for each such logical port <var>P</var>, the actions | |
761 | change the logical output port to <var>P</var>, then resubmit to table | |
762 | 34. | |
763 | </p> | |
764 | ||
765 | <p> | |
766 | Table 34 matches and drops packets for which the logical input and | |
767 | output ports are the same. It resubmits other packets to table 48. | |
b705f9ea JP |
768 | </p> |
769 | </li> | |
5868eb24 BP |
770 | |
771 | <li> | |
772 | <p> | |
773 | OpenFlow tables 48 through 63 execute the logical egress pipeline from | |
774 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
775 | The egress pipeline can perform a final stage of validation before | |
776 | packet delivery. Eventually, it may execute an <code>output</code> | |
777 | action, which <code>ovn-controller</code> implements by resubmitting to | |
778 | table 64. A packet for which the pipeline never executes | |
779 | <code>output</code> is effectively dropped (although it may have been | |
780 | transmitted through a tunnel across a physical network). | |
781 | </p> | |
782 | ||
783 | <p> | |
784 | The egress pipeline cannot change the logical output port or cause | |
785 | further tunneling. | |
786 | </p> | |
787 | </li> | |
788 | ||
789 | <li> | |
790 | <p> | |
791 | OpenFlow table 64 performs logical-to-physical translation, the | |
792 | opposite of table 0. It matches the packet's logical egress port. Its | |
793 | actions output the packet to the port attached to the OVN integration | |
794 | bridge that represents that logical port. If the logical egress port | |
795 | is a container nested with a VM, then before sending the packet the | |
796 | actions push on a VLAN header with an appropriate VLAN ID. | |
797 | </p> | |
798 | </li> | |
799 | </ol> | |
800 | ||
801 | <h1>Design Decisions</h1> | |
802 | ||
803 | <h2>Tunnel Encapsulations</h2> | |
804 | ||
805 | <p> | |
806 | OVN annotates logical network packets that it sends from one hypervisor to | |
807 | another with the following three pieces of metadata, which are encoded in | |
808 | an encapsulation-specific fashion: | |
809 | </p> | |
810 | ||
811 | <ul> | |
812 | <li> | |
813 | 24-bit logical datapath identifier, from the <code>tunnel_key</code> | |
814 | column in the OVN Southbound <code>Datapath_Binding</code> table. | |
815 | </li> | |
816 | ||
817 | <li> | |
818 | 15-bit logical ingress port identifier. ID 0 is reserved for internal | |
819 | use within OVN. IDs 1 through 32767, inclusive, may be assigned to | |
820 | logical ports (see the <code>tunnel_key</code> column in the OVN | |
821 | Southbound <code>Port_Binding</code> table). | |
822 | </li> | |
823 | ||
824 | <li> | |
825 | 16-bit logical egress port identifier. IDs 0 through 32767 have the same | |
826 | meaning as for logical ingress ports. IDs 32768 through 65535, | |
827 | inclusive, may be assigned to logical multicast groups (see the | |
828 | <code>tunnel_key</code> column in the OVN Southbound | |
829 | <code>Multicast_Group</code> table). | |
830 | </li> | |
b705f9ea JP |
831 | </ul> |
832 | ||
833 | <p> | |
5868eb24 BP |
834 | For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT |
835 | encapsulations, for the following reasons: | |
b705f9ea JP |
836 | </p> |
837 | ||
5868eb24 BP |
838 | <ul> |
839 | <li> | |
840 | Only STT and Geneve support the large amounts of metadata (over 32 bits | |
841 | per packet) that OVN uses (as described above). | |
842 | </li> | |
843 | ||
844 | <li> | |
845 | STT and Geneve use randomized UDP or TCP source ports that allows | |
846 | efficient distribution among multiple paths in environments that use ECMP | |
847 | in their underlay. | |
848 | </li> | |
849 | ||
850 | <li> | |
851 | NICs are available to offload STT and Geneve encapsulation and | |
852 | decapsulation. | |
853 | </li> | |
854 | </ul> | |
855 | ||
856 | <p> | |
857 | Due to its flexibility, the preferred encapsulation between hypervisors is | |
858 | Geneve. For Geneve encapsulation, OVN transmits the logical datapath | |
859 | identifier in the Geneve VNI. | |
860 | ||
861 | <!-- Keep the following in sync with ovn/controller/physical.h. --> | |
862 | OVN transmits the logical ingress and logical egress ports in a TLV with | |
863 | class 0xffff, type 0, and a 32-bit value encoded as follows, from MSB to | |
864 | LSB: | |
865 | </p> | |
866 | ||
867 | <diagram> | |
868 | <header name=""> | |
869 | <bits name="rsv" above="1" below="0" width=".25"/> | |
870 | <bits name="ingress port" above="15" width=".75"/> | |
871 | <bits name="egress port" above="16" width=".75"/> | |
872 | </header> | |
873 | </diagram> | |
874 | ||
875 | <p> | |
876 | Environments whose NICs lack Geneve offload may prefer STT encapsulation | |
877 | for performance reasons. For STT encapsulation, OVN encodes all three | |
878 | pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB | |
879 | to LSB: | |
880 | </p> | |
881 | ||
882 | <diagram> | |
883 | <header name=""> | |
884 | <bits name="reserved" above="9" below="0" width=".5"/> | |
885 | <bits name="ingress port" above="15" width=".75"/> | |
886 | <bits name="egress port" above="16" width=".75"/> | |
887 | <bits name="datapath" above="24" width="1.25"/> | |
888 | </header> | |
889 | </diagram> | |
890 | ||
b705f9ea | 891 | <p> |
5868eb24 BP |
892 | For connecting to gateways, in addition to Geneve and STT, OVN supports |
893 | VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches. | |
894 | Currently, gateways have a feature set that matches the capabilities as | |
895 | defined by the VTEP schema, so fewer bits of metadata are necessary. In | |
896 | the future, gateways that do not support encapsulations with large amounts | |
897 | of metadata may continue to have a reduced feature set. | |
b705f9ea | 898 | </p> |
fe36184b | 899 | </manpage> |