]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
45 | <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor | |
46 | platform supported by Open vSwitch is acceptable. | |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
56 | switch that supports the <code>vtep</code>(5) schema. (Support for the | |
57 | latter will come later in OVN implementation.) | |
fe36184b BP |
58 | </p> |
59 | ||
60 | <p> | |
fa6aeaeb RB |
61 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
62 | or <dfn>chassis</dfn>. | |
fe36184b BP |
63 | </p> |
64 | </li> | |
65 | </ul> | |
66 | ||
67 | <p> | |
68 | The diagram below shows how the major components of OVN and related | |
69 | software interact. Starting at the top of the diagram, we have: | |
70 | </p> | |
71 | ||
72 | <ul> | |
73 | <li> | |
74 | The Cloud Management System, as defined above. | |
75 | </li> | |
76 | ||
77 | <li> | |
78 | <p> | |
fa6aeaeb RB |
79 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
80 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
81 | The plugin's main purpose is to translate the CMS's notion of logical | |
82 | network configuration, stored in the CMS's configuration database in a | |
83 | CMS-specific format, into an intermediate representation understood by | |
84 | OVN. | |
fe36184b BP |
85 | </p> |
86 | ||
87 | <p> | |
fa6aeaeb RB |
88 | This component is necessarily CMS-specific, so a new plugin needs to be |
89 | developed for each CMS that is integrated with OVN. All of the | |
90 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
91 | </p> |
92 | </li> | |
93 | ||
94 | <li> | |
95 | <p> | |
fa6aeaeb RB |
96 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
97 | representation of logical network configuration passed down by the | |
98 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
99 | matched'' with the concepts used in a CMS, so that it directly supports | |
100 | notions of logical switches, routers, ACLs, and so on. See | |
5868eb24 | 101 | <code>ovn-nb</code>(5) for details. |
fe36184b BP |
102 | </p> |
103 | ||
104 | <p> | |
fa6aeaeb RB |
105 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
106 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
107 | </p> |
108 | </li> | |
109 | ||
110 | <li> | |
91ae2065 RB |
111 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
112 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
113 | logical network configuration in terms of conventional network |
114 | concepts, taken from the OVN Northbound Database, into logical | |
115 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
116 | </li> |
117 | ||
118 | <li> | |
119 | <p> | |
ec78987f | 120 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 121 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 122 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
123 | </p> |
124 | ||
125 | <p> | |
fa6aeaeb RB |
126 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
127 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
128 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
129 | logical network in terms of ``logical datapath flows,'' and | |
130 | <dfn>Binding</dfn> tables that link logical network components' | |
131 | locations to the physical network. The hypervisors populate the PN and | |
dcda6e0d BP |
132 | Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the |
133 | LN tables. | |
fe36184b BP |
134 | </p> |
135 | ||
136 | <p> | |
ec78987f JP |
137 | OVN Southbound Database performance must scale with the number of |
138 | transport nodes. This will likely require some work on | |
139 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
140 | Clustering for availability may be needed. | |
fe36184b BP |
141 | </p> |
142 | </li> | |
143 | </ul> | |
144 | ||
145 | <p> | |
146 | The remaining components are replicated onto each hypervisor: | |
147 | </p> | |
148 | ||
149 | <ul> | |
150 | <li> | |
151 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
152 | software gateway. Northbound, it connects to the OVN Southbound |
153 | Database to learn about OVN configuration and status and to | |
154 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 155 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
156 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
157 | OpenFlow controller, for control over network traffic, and to the | |
158 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
159 | control Open vSwitch configuration. | |
fe36184b BP |
160 | </li> |
161 | ||
162 | <li> | |
163 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
164 | conventional components of Open vSwitch. | |
165 | </li> | |
166 | </ul> | |
167 | ||
168 | <pre fixed="yes"> | |
169 | CMS | |
170 | | | |
171 | | | |
172 | +-----------|-----------+ | |
173 | | | | | |
174 | | OVN/CMS Plugin | | |
175 | | | | | |
176 | | | | | |
177 | | OVN Northbound DB | | |
178 | | | | | |
179 | | | | | |
91ae2065 | 180 | | ovn-northd | |
fe36184b BP |
181 | | | | |
182 | +-----------|-----------+ | |
183 | | | |
184 | | | |
ec78987f JP |
185 | +-------------------+ |
186 | | OVN Southbound DB | | |
187 | +-------------------+ | |
fe36184b BP |
188 | | |
189 | | | |
190 | +------------------+------------------+ | |
191 | | | | | |
ec78987f | 192 | HV 1 | | HV n | |
fe36184b BP |
193 | +---------------|---------------+ . +---------------|---------------+ |
194 | | | | . | | | | |
195 | | ovn-controller | . | ovn-controller | | |
196 | | | | | . | | | | | |
197 | | | | | | | | | | |
198 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
199 | | | | | | |
200 | +-------------------------------+ +-------------------------------+ | |
201 | </pre> | |
202 | ||
ca1564ec BP |
203 | <h2>Chassis Setup</h2> |
204 | ||
205 | <p> | |
206 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
207 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
e43fc07c RB |
208 | System startup scripts may create this bridge prior to starting |
209 | <code>ovn-controller</code> if desired. If this bridge does not exist when | |
210 | ovn-controller starts, it will be created automatically with the default | |
211 | configuration suggested below. The ports on the integration bridge include: | |
ca1564ec BP |
212 | </p> |
213 | ||
214 | <ul> | |
215 | <li> | |
216 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
217 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
218 | these tunnel ports. | |
219 | </li> | |
220 | ||
221 | <li> | |
222 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
223 | The hypervisor itself, or the integration between Open vSwitch and the | |
224 | hypervisor (described in <code>IntegrationGuide.md</code>) takes care of | |
225 | this. (This is not part of OVN or new to OVN; this is pre-existing | |
226 | integration work that has already been done on hypervisors that support | |
227 | OVS.) | |
228 | </li> | |
229 | ||
230 | <li> | |
231 | On a gateway, the physical port used for logical network connectivity. | |
232 | System startup scripts add this port to the bridge prior to starting | |
233 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
234 | instead of a physical port, in more sophisticated setups. | |
235 | </li> | |
236 | </ul> | |
237 | ||
238 | <p> | |
239 | Other ports should not be attached to the integration bridge. In | |
240 | particular, physical ports attached to the underlay network (as opposed to | |
241 | gateway ports, which are physical ports attached to logical networks) must | |
242 | not be attached to the integration bridge. Underlay physical ports should | |
243 | instead be attached to a separate Open vSwitch bridge (they need not be | |
244 | attached to any bridge at all, in fact). | |
245 | </p> | |
246 | ||
247 | <p> | |
a42226f0 BP |
248 | The integration bridge should be configured as described below. |
249 | The effect of each of these settings is documented in | |
250 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
251 | </p> |
252 | ||
e43fc07c RB |
253 | <!-- Keep the following in sync with create_br_int() in |
254 | ovn/controller/ovn-controller.c. --> | |
a42226f0 BP |
255 | <dl> |
256 | <dt><code>fail-mode=secure</code></dt> | |
257 | <dd> | |
258 | Avoids switching packets between isolated logical networks before | |
259 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
260 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
261 | </dd> | |
262 | ||
263 | <dt><code>other-config:disable-in-band=true</code></dt> | |
264 | <dd> | |
265 | Suppresses in-band control flows for the integration bridge. It would be | |
266 | unusual for such flows to show up anyway, because OVN uses a local | |
267 | controller (over a Unix domain socket) instead of a remote controller. | |
268 | It's possible, however, for some other bridge in the same system to have | |
269 | an in-band remote controller, and in that case this suppresses the flows | |
270 | that in-band control would ordinarily set up. See <code>In-Band | |
271 | Control</code> in <code>DESIGN.md</code> for more information. | |
272 | </dd> | |
273 | </dl> | |
274 | ||
ca1564ec BP |
275 | <p> |
276 | The customary name for the integration bridge is <code>br-int</code>, but | |
277 | another name may be used. | |
278 | </p> | |
279 | ||
747b2a45 BP |
280 | <h2>Logical Networks</h2> |
281 | ||
282 | <p> | |
283 | A <dfn>logical network</dfn> implements the same concepts as physical | |
284 | networks, but they are insulated from the physical network with tunnels or | |
285 | other encapsulations. This allows logical networks to have separate IP and | |
286 | other address spaces that overlap, without conflicting, with those used for | |
287 | physical networks. Logical network topologies can be arranged without | |
288 | regard for the topologies of the physical networks on which they run. | |
289 | </p> | |
290 | ||
291 | <p> | |
292 | Logical network concepts in OVN include: | |
293 | </p> | |
294 | ||
295 | <ul> | |
296 | <li> | |
297 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
298 | </li> | |
299 | ||
300 | <li> | |
301 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
302 | switches and routers can be connected into sophisticated topologies. | |
303 | </li> | |
304 | ||
305 | <li> | |
306 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
307 | switch. Logical switches and routers are both implemented as logical | |
308 | datapaths. | |
309 | </li> | |
310 | </ul> | |
311 | ||
ca1564ec | 312 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
313 | |
314 | <p> | |
315 | Tables and their schemas presented in isolation are difficult to | |
316 | understand. Here's an example. | |
317 | </p> | |
318 | ||
9fb4636f GS |
319 | <p> |
320 | A VIF on a hypervisor is a virtual network interface attached either | |
321 | to a VM or a container running directly on that hypervisor (This is | |
322 | different from the interface of a container running inside a VM). | |
323 | </p> | |
324 | ||
fe36184b BP |
325 | <p> |
326 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 327 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
328 | <code>ovn-nb</code>(5), respectively, for the full story on these |
329 | databases. | |
330 | </p> | |
331 | ||
332 | <ol> | |
333 | <li> | |
334 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
335 | using the CMS user interface or API and adds it to a switch (one | |
336 | implemented by OVN as a logical switch). The CMS updates its own | |
337 | configuration. This includes associating unique, persistent identifier | |
338 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
339 | </li> | |
340 | ||
341 | <li> | |
342 | The CMS plugin updates the OVN Northbound database to include the new | |
343 | VIF, by adding a row to the <code>Logical_Port</code> table. In the new | |
344 | row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is | |
345 | <var>mac</var>, <code>switch</code> points to the OVN logical switch's | |
346 | Logical_Switch record, and other columns are initialized appropriately. | |
347 | </li> | |
348 | ||
349 | <li> | |
5868eb24 BP |
350 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
351 | turn, it makes the corresponding updates to the OVN Southbound database, | |
352 | by adding rows to the OVN Southbound database <code>Logical_Flow</code> | |
353 | table to reflect the new port, e.g. add a flow to recognize that packets | |
354 | destined to the new port's MAC address should be delivered to it, and | |
355 | update the flow that delivers broadcast and multicast packets to include | |
356 | the new port. It also creates a record in the <code>Binding</code> table | |
357 | and populates all its columns except the column that identifies the | |
9fb4636f | 358 | <code>chassis</code>. |
fe36184b BP |
359 | </li> |
360 | ||
361 | <li> | |
362 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 363 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
364 | in the previous step. As long as the VM that owns the VIF is powered |
365 | off, <code>ovn-controller</code> cannot do much; it cannot, for example, | |
fe36184b BP |
366 | arrange to send packets to or receive packets from the VIF, because the |
367 | VIF does not actually exist anywhere. | |
368 | </li> | |
369 | ||
370 | <li> | |
371 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
372 | where the VM is powered on, the integration between the hypervisor and | |
373 | Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF | |
374 | to the OVN integration bridge and stores <var>vif-id</var> in | |
375 | <code>external-ids</code>:<code>iface-id</code> to indicate that the | |
376 | interface is an instantiation of the new VIF. (None of this code is new | |
377 | in OVN; this is pre-existing integration work that has already been done | |
378 | on hypervisors that support OVS.) | |
379 | </li> | |
380 | ||
381 | <li> | |
382 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
383 | notices <code>external-ids</code>:<code>iface-id</code> in the new | |
384 | Interface. In response, it updates the local hypervisor's OpenFlow | |
385 | tables so that packets to and from the VIF are properly handled. | |
a0149f47 | 386 | Afterward, in the OVN Southbound DB, it updates the |
e387e3e8 | 387 | <code>Binding</code> table's <code>chassis</code> column for the |
a0149f47 JP |
388 | row that links the logical port from |
389 | <code>external-ids</code>:<code>iface-id</code> to the hypervisor. | |
fe36184b BP |
390 | </li> |
391 | ||
392 | <li> | |
393 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
394 | networking is ready. To support this, <code>ovn-northd</code> notices |
395 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 396 | <code>Binding</code> table and pushes this upward by updating the |
91ae2065 RB |
397 | <ref column="up" table="Logical_Port" db="OVN_NB"/> column in the OVN |
398 | Northbound database's <ref table="Logical_Port" db="OVN_NB"/> table to | |
399 | indicate that the VIF is now up. The CMS, if it uses this feature, can | |
400 | then | |
9fb4636f | 401 | react by allowing the VM's execution to proceed. |
fe36184b BP |
402 | </li> |
403 | ||
404 | <li> | |
405 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 406 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 407 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
408 | the physical location of the logical port, so each instance updates the |
409 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
5868eb24 BP |
410 | DB <code>Logical_Flow</code> table) so that packets to and from the VIF |
411 | can be properly handled via tunnels. | |
fe36184b BP |
412 | </li> |
413 | ||
414 | <li> | |
415 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 416 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
417 | integration bridge. |
418 | </li> | |
419 | ||
420 | <li> | |
6eceebf5 | 421 | On the hypervisor where the VM was powered off, |
fe36184b | 422 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 423 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 424 | <code>Binding</code> table for the logical port. |
fe36184b BP |
425 | </li> |
426 | ||
427 | <li> | |
9fb4636f | 428 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 429 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
430 | for the logical port. This means that <code>ovn-controller</code> no |
431 | longer knows the physical location of the logical port, so each instance | |
432 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
433 | </li> |
434 | ||
435 | <li> | |
436 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
437 | anyone, an administrator deletes the VIF using the CMS user interface or | |
438 | API. The CMS updates its own configuration. | |
439 | </li> | |
440 | ||
441 | <li> | |
442 | The CMS plugin removes the VIF from the OVN Northbound database, | |
443 | by deleting its row in the <code>Logical_Port</code> table. | |
444 | </li> | |
445 | ||
446 | <li> | |
91ae2065 | 447 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
448 | updates the OVN Southbound database accordingly, by removing or updating |
449 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
450 | and <code>Binding</code> table that were related to the now-destroyed | |
451 | VIF. | |
fe36184b BP |
452 | </li> |
453 | ||
454 | <li> | |
455 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 456 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
457 | in the previous step. <code>ovn-controller</code> updates OpenFlow |
458 | tables to reflect the update, although there may not be much to do, since | |
459 | the VIF had already become unreachable when it was removed from the | |
e387e3e8 | 460 | <code>Binding</code> table in a previous step. |
fe36184b BP |
461 | </li> |
462 | </ol> | |
463 | ||
a30b56d4 | 464 | <h2>Life Cycle of a Container Interface Inside a VM</h2> |
9fb4636f GS |
465 | |
466 | <p> | |
467 | OVN provides virtual network abstractions by converting information | |
468 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
469 | virtual networking for multi-tenants can only be provided if OVN controller | |
470 | is the only entity that can modify flows in Open vSwitch. When the | |
471 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
472 | fair assumption to make that tenant workloads running inside VMs cannot | |
473 | make any changes to Open vSwitch flows. | |
474 | </p> | |
475 | ||
476 | <p> | |
477 | If the infrastructure provider trusts the applications inside the | |
478 | containers not to break out and modify the Open vSwitch flows, then | |
479 | containers can be run in hypervisors. This is also the case when | |
480 | containers are run inside the VMs and Open vSwitch integration bridge | |
481 | with flows added by OVN controller resides in the same VM. For both | |
482 | the above cases, the workflow is the same as explained with an example | |
483 | in the previous section ("Life Cycle of a VIF"). | |
484 | </p> | |
485 | ||
486 | <p> | |
487 | This section talks about the life cycle of a container interface (CIF) | |
488 | when containers are created in the VMs and the Open vSwitch integration | |
489 | bridge resides inside the hypervisor. In this case, even if a container | |
490 | application breaks out, other tenants are not affected because the | |
491 | containers running inside the VMs cannot modify the flows in the | |
492 | Open vSwitch integration bridge. | |
493 | </p> | |
494 | ||
495 | <p> | |
496 | When multiple containers are created inside a VM, there are multiple | |
497 | CIFs associated with them. The network traffic associated with these | |
498 | CIFs need to reach the Open vSwitch integration bridge running in the | |
499 | hypervisor for OVN to support virtual network abstractions. OVN should | |
500 | also be able to distinguish network traffic coming from different CIFs. | |
501 | There are two ways to distinguish network traffic of CIFs. | |
502 | </p> | |
503 | ||
504 | <p> | |
505 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
506 | there could be a lot of network devices in the hypervisor. This would slow | |
507 | down OVS because of all the additional CPU cycles needed for the management | |
508 | of all the VIFs. It would also mean that the entity creating the | |
509 | containers in a VM should also be able to create the corresponding VIFs in | |
510 | the hypervisor. | |
511 | </p> | |
512 | ||
513 | <p> | |
514 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
515 | OVN could then distinguish network traffic coming from different CIFs via | |
516 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
517 | the tagging mechanism. | |
518 | </p> | |
519 | ||
520 | <ol> | |
521 | <li> | |
522 | A CIF's life cycle begins when a container is spawned inside a VM by | |
523 | the either the same CMS that created the VM or a tenant that owns that VM | |
524 | or even a container Orchestration System that is different than the CMS | |
525 | that initially created the VM. Whoever the entity is, it will need to | |
526 | know the <var>vif-id</var> that is associated with the network interface | |
527 | of the VM through which the container interface's network traffic is | |
528 | expected to go through. The entity that creates the container interface | |
529 | will also need to choose an unused VLAN inside that VM. | |
530 | </li> | |
531 | ||
532 | <li> | |
533 | The container spawning entity (either directly or through the CMS that | |
534 | manages the underlying infrastructure) updates the OVN Northbound | |
535 | database to include the new CIF, by adding a row to the | |
536 | <code>Logical_Port</code> table. In the new row, <code>name</code> is | |
537 | any unique identifier, <code>parent_name</code> is the <var>vif-id</var> | |
538 | of the VM through which the CIF's network traffic is expected to go | |
539 | through and the <code>tag</code> is the VLAN tag that identifies the | |
540 | network traffic of that CIF. | |
541 | </li> | |
542 | ||
543 | <li> | |
5868eb24 BP |
544 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
545 | turn, it makes the corresponding updates to the OVN Southbound database, | |
546 | by adding rows to the OVN Southbound database's <code>Logical_Flow</code> | |
547 | table to reflect the new port and also by creating a new row in the | |
548 | <code>Binding</code> table and populating all its columns except the | |
549 | column that identifies the <code>chassis</code>. | |
9fb4636f GS |
550 | </li> |
551 | ||
552 | <li> | |
553 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 554 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 555 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 556 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
557 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
558 | has that same value in <var>vif-id</var> in | |
559 | <code>external-ids</code>:<code>iface-id</code> | |
9fb4636f GS |
560 | updates the local hypervisor's OpenFlow tables so that packets to and |
561 | from the VIF with the particular VLAN <code>tag</code> are properly | |
562 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 563 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
564 | </li> |
565 | ||
566 | <li> | |
567 | One can only start the application inside the container after the | |
91ae2065 | 568 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 569 | notices the updated <code>chassis</code> column in <code>Binding</code> |
9fb4636f GS |
570 | table and updates the <ref column="up" table="Logical_Port" |
571 | db="OVN_NB"/> column in the OVN Northbound database's | |
572 | <ref table="Logical_Port" db="OVN_NB"/> table to indicate that the | |
573 | CIF is now up. The entity responsible to start the container application | |
574 | queries this value and starts the application. | |
575 | </li> | |
576 | ||
577 | <li> | |
578 | Eventually the entity that created and started the container, stops it. | |
579 | The entity, through the CMS (or directly) deletes its row in the | |
580 | <code>Logical_Port</code> table. | |
581 | </li> | |
582 | ||
583 | <li> | |
91ae2065 | 584 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
585 | updates the OVN Southbound database accordingly, by removing or updating |
586 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
587 | that were related to the now-destroyed CIF. It also deletes the row in | |
588 | the <code>Binding</code> table for that CIF. | |
9fb4636f GS |
589 | </li> |
590 | ||
591 | <li> | |
592 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 BP |
593 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
594 | in the previous step. <code>ovn-controller</code> updates OpenFlow | |
595 | tables to reflect the update. | |
9fb4636f GS |
596 | </li> |
597 | </ol> | |
b705f9ea | 598 | |
5868eb24 | 599 | <h2>Life Cycle of a Packet</h2> |
b705f9ea | 600 | |
b705f9ea | 601 | <p> |
5868eb24 BP |
602 | This section describes how a packet travels from one virtual machine or |
603 | container to another through OVN. This description focuses on the physical | |
604 | treatment of a packet; for a description of the logical life cycle of a | |
605 | packet, please refer to the <code>Logical_Flow</code> table in | |
606 | <code>ovn-sb</code>(5). | |
b705f9ea JP |
607 | </p> |
608 | ||
5868eb24 BP |
609 | <p> |
610 | This section mentions several data and metadata fields, for clarity | |
611 | summarized here: | |
612 | </p> | |
613 | ||
614 | <dl> | |
615 | <dt>tunnel key</dt> | |
616 | <dd> | |
617 | When OVN encapsulates a packet in Geneve or another tunnel, it attaches | |
618 | extra data to it to allow the receiving OVN instance to process it | |
619 | correctly. This takes different forms depending on the particular | |
620 | encapsulation, but in each case we refer to it here as the ``tunnel | |
621 | key.'' See <code>Tunnel Encapsulations</code>, below, for details. | |
622 | </dd> | |
623 | ||
624 | <dt>logical datapath field</dt> | |
625 | <dd> | |
626 | A field that denotes the logical datapath through which a packet is being | |
4103f6d2 BP |
627 | processed. |
628 | <!-- Keep the following in sync with MFF_LOG_DATAPATH in | |
629 | ovn/controller/lflow.h. --> | |
630 | OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls | |
631 | ``metadata'' to store the logical datapath. (This field is passed across | |
632 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
633 | </dd> |
634 | ||
635 | <dt>logical input port field</dt> | |
636 | <dd> | |
cd144a41 | 637 | A field that denotes the logical port from which the packet |
4103f6d2 BP |
638 | entered the logical datapath. |
639 | <!-- Keep the following in sync with MFF_LOG_INPORT in | |
640 | ovn/controller/lflow.h. --> | |
641 | OVN stores this in Nicira extension register number 6. (This field is | |
642 | passed across tunnels as part of the tunnel key.) | |
5868eb24 BP |
643 | </dd> |
644 | ||
645 | <dt>logical output port field</dt> | |
646 | <dd> | |
cd144a41 JP |
647 | A field that denotes the logical port from which the packet will |
648 | leave the logical datapath. This is initialized to 0 at the | |
4103f6d2 BP |
649 | beginning of the logical ingress pipeline. |
650 | <!-- Keep the following in sync with MFF_LOG_OUTPORT in | |
651 | ovn/controller/lflow.h. --> | |
652 | OVN stores this in | |
cd144a41 JP |
653 | Nicira extension register number 7. (This field is passed across |
654 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
655 | </dd> |
656 | ||
657 | <dt>VLAN ID</dt> | |
658 | <dd> | |
659 | The VLAN ID is used as an interface between OVN and containers nested | |
660 | inside a VM (see <code>Life Cycle of a container interface inside a | |
661 | VM</code>, above, for more information). | |
662 | </dd> | |
663 | </dl> | |
664 | ||
665 | <p> | |
666 | Initially, a VM or container on the ingress hypervisor sends a packet on a | |
667 | port attached to the OVN integration bridge. Then: | |
668 | </p> | |
669 | ||
670 | <ol> | |
b705f9ea JP |
671 | <li> |
672 | <p> | |
5868eb24 BP |
673 | OpenFlow table 0 performs physical-to-logical translation. It matches |
674 | the packet's ingress port. Its actions annotate the packet with | |
675 | logical metadata, by setting the logical datapath field to identify the | |
676 | logical datapath that the packet is traversing and the logical input | |
677 | port field to identify the ingress port. Then it resubmits to table 16 | |
678 | to enter the logical ingress pipeline. | |
679 | </p> | |
680 | ||
c0281929 RB |
681 | <p> |
682 | It's possible that a single ingress physical port maps to multiple | |
683 | logical ports with a type of <code>localnet</code>. The logical datapath | |
684 | and logical input port fields will be reset and the packet will be | |
685 | resubmitted to table 16 multiple times. | |
686 | </p> | |
687 | ||
5868eb24 BP |
688 | <p> |
689 | Packets that originate from a container nested within a VM are treated | |
690 | in a slightly different way. The originating container can be | |
691 | distinguished based on the VIF-specific VLAN ID, so the | |
692 | physical-to-logical translation flows additionally match on VLAN ID and | |
693 | the actions strip the VLAN header. Following this step, OVN treats | |
694 | packets from containers just like any other packets. | |
695 | </p> | |
696 | ||
697 | <p> | |
698 | Table 0 also processes packets that arrive from other chassis. It | |
699 | distinguishes them from other packets by ingress port, which is a | |
700 | tunnel. As with packets just entering the OVN pipeline, the actions | |
701 | annotate these packets with logical datapath and logical ingress port | |
702 | metadata. In addition, the actions set the logical output port field, | |
703 | which is available because in OVN tunneling occurs after the logical | |
704 | output port is known. These three pieces of information are obtained | |
705 | from the tunnel encapsulation metadata (see <code>Tunnel | |
706 | Encapsulations</code> for encoding details). Then the actions resubmit | |
707 | to table 33 to enter the logical egress pipeline. | |
b705f9ea JP |
708 | </p> |
709 | </li> | |
710 | ||
711 | <li> | |
712 | <p> | |
5868eb24 BP |
713 | OpenFlow tables 16 through 31 execute the logical ingress pipeline from |
714 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
715 | These tables are expressed entirely in terms of logical concepts like | |
716 | logical ports and logical datapaths. A big part of | |
717 | <code>ovn-controller</code>'s job is to translate them into equivalent | |
718 | OpenFlow (in particular it translates the table numbers: | |
719 | <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16 | |
720 | through 31). For a given packet, the logical ingress pipeline | |
721 | eventually executes zero or more <code>output</code> actions: | |
b705f9ea | 722 | </p> |
5868eb24 BP |
723 | |
724 | <ul> | |
725 | <li> | |
726 | If the pipeline executes no <code>output</code> actions at all, the | |
727 | packet is effectively dropped. | |
728 | </li> | |
729 | ||
730 | <li> | |
731 | Most commonly, the pipeline executes one <code>output</code> action, | |
732 | which <code>ovn-controller</code> implements by resubmitting the | |
733 | packet to table 32. | |
734 | </li> | |
735 | ||
736 | <li> | |
737 | If the pipeline can execute more than one <code>output</code> action, | |
738 | then each one is separately resubmitted to table 32. This can be | |
739 | used to send multiple copies of the packet to multiple ports. (If | |
740 | the packet was not modified between the <code>output</code> actions, | |
741 | and some of the copies are destined to the same hypervisor, then | |
742 | using a logical multicast output port would save bandwidth between | |
743 | hypervisors.) | |
744 | </li> | |
745 | </ul> | |
b705f9ea JP |
746 | </li> |
747 | ||
748 | <li> | |
749 | <p> | |
5868eb24 BP |
750 | OpenFlow tables 32 through 47 implement the <code>output</code> action |
751 | in the logical ingress pipeline. Specifically, table 32 handles | |
752 | packets to remote hypervisors, table 33 handles packets to the local | |
753 | hypervisor, and table 34 discards packets whose logical ingress and | |
754 | egress port are the same. | |
755 | </p> | |
756 | ||
757 | <p> | |
758 | Each flow in table 32 matches on a logical output port for unicast or | |
759 | multicast logical ports that include a logical port on a remote | |
760 | hypervisor. Each flow's actions implement sending a packet to the port | |
761 | it matches. For unicast logical output ports on remote hypervisors, | |
762 | the actions set the tunnel key to the correct value, then send the | |
763 | packet on the tunnel port to the correct hypervisor. (When the remote | |
764 | hypervisor receives the packet, table 0 there will recognize it as a | |
765 | tunneled packet and pass it along to table 33.) For multicast logical | |
766 | output ports, the actions send one copy of the packet to each remote | |
767 | hypervisor, in the same way as for unicast destinations. If a | |
768 | multicast group includes a logical port or ports on the local | |
769 | hypervisor, then its actions also resubmit to table 33. Table 32 also | |
770 | includes a fallback flow that resubmits to table 33 if there is no | |
771 | other match. | |
772 | </p> | |
773 | ||
774 | <p> | |
775 | Flows in table 33 resemble those in table 32 but for logical ports that | |
776 | reside locally rather than remotely. For unicast logical output ports | |
777 | on the local hypervisor, the actions just resubmit to table 34. For | |
778 | multicast output ports that include one or more logical ports on the | |
779 | local hypervisor, for each such logical port <var>P</var>, the actions | |
780 | change the logical output port to <var>P</var>, then resubmit to table | |
781 | 34. | |
782 | </p> | |
783 | ||
784 | <p> | |
785 | Table 34 matches and drops packets for which the logical input and | |
786 | output ports are the same. It resubmits other packets to table 48. | |
b705f9ea JP |
787 | </p> |
788 | </li> | |
5868eb24 BP |
789 | |
790 | <li> | |
791 | <p> | |
792 | OpenFlow tables 48 through 63 execute the logical egress pipeline from | |
793 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
794 | The egress pipeline can perform a final stage of validation before | |
795 | packet delivery. Eventually, it may execute an <code>output</code> | |
796 | action, which <code>ovn-controller</code> implements by resubmitting to | |
797 | table 64. A packet for which the pipeline never executes | |
798 | <code>output</code> is effectively dropped (although it may have been | |
799 | transmitted through a tunnel across a physical network). | |
800 | </p> | |
801 | ||
802 | <p> | |
803 | The egress pipeline cannot change the logical output port or cause | |
804 | further tunneling. | |
805 | </p> | |
806 | </li> | |
807 | ||
808 | <li> | |
809 | <p> | |
810 | OpenFlow table 64 performs logical-to-physical translation, the | |
811 | opposite of table 0. It matches the packet's logical egress port. Its | |
812 | actions output the packet to the port attached to the OVN integration | |
813 | bridge that represents that logical port. If the logical egress port | |
814 | is a container nested with a VM, then before sending the packet the | |
815 | actions push on a VLAN header with an appropriate VLAN ID. | |
816 | </p> | |
817 | </li> | |
818 | </ol> | |
819 | ||
88058f19 AW |
820 | <h2>Life Cycle of a VTEP gateway</h2> |
821 | ||
822 | <p> | |
823 | A gateway is a chassis that forwards traffic between the OVN-managed | |
824 | part of a logical network and a physical VLAN, extending a | |
825 | tunnel-based logical network into a physical network. | |
826 | </p> | |
827 | ||
828 | <p> | |
829 | The steps below refer often to details of the OVN and VTEP database | |
830 | schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5) | |
831 | and <code>vtep</code>(5), respectively, for the full story on these | |
832 | databases. | |
833 | </p> | |
834 | ||
835 | <ol> | |
836 | <li> | |
837 | A VTEP gateway's life cycle begins with the administrator registering | |
838 | the VTEP gateway as a <code>Physical_Switch</code> table entry in the | |
839 | <code>VTEP</code> database. The <code>ovn-controller-vtep</code> | |
840 | connected to this VTEP database, will recognize the new VTEP gateway | |
841 | and create a new <code>Chassis</code> table entry for it in the | |
842 | <code>OVN_Southbound</code> database. | |
843 | </li> | |
844 | ||
845 | <li> | |
846 | The administrator can then create a new <code>Logical_Switch</code> | |
847 | table entry, and bind a particular vlan on a VTEP gateway's port to | |
848 | any VTEP logical switch. Once a VTEP logical switch is bound to | |
849 | a VTEP gateway, the <code>ovn-controller-vtep</code> will detect | |
850 | it and add its name to the <var>vtep_logical_switches</var> | |
851 | column of the <code>Chassis</code> table in the <code> | |
852 | OVN_Southbound</code> database. Note, the <var>tunnel_key</var> | |
853 | column of VTEP logical switch is not filled at creation. The | |
854 | <code>ovn-controller-vtep</code> will set the column when the | |
855 | correponding vtep logical switch is bound to an OVN logical network. | |
856 | </li> | |
857 | ||
858 | <li> | |
859 | Now, the administrator can use the CMS to add a VTEP logical switch | |
860 | to the OVN logical network. To do that, the CMS must first create a | |
861 | new <code>Logical_Port</code> table entry in the <code> | |
862 | OVN_Northbound</code> database. Then, the <var>type</var> column | |
863 | of this entry must be set to "vtep". Next, the <var> | |
864 | vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys | |
865 | in the <var>options</var> column must also be specified, since | |
866 | multiple VTEP gateways can attach to the same VTEP logical switch. | |
867 | </li> | |
868 | ||
869 | <li> | |
870 | The newly created logical port in the <code>OVN_Northbound</code> | |
871 | database and its configuration will be passed down to the <code> | |
872 | OVN_Southbound</code> database as a new <code>Port_Binding</code> | |
873 | table entry. The <code>ovn-controller-vtep</code> will recognize the | |
874 | change and bind the logical port to the corresponding VTEP gateway | |
875 | chassis. Configuration of binding the same VTEP logical switch to | |
876 | a different OVN logical networks is not allowed and a warning will be | |
877 | generated in the log. | |
878 | </li> | |
879 | ||
880 | <li> | |
881 | Beside binding to the VTEP gateway chassis, the <code> | |
882 | ovn-controller-vtep</code> will update the <var>tunnel_key</var> | |
883 | column of the VTEP logical switch to the corresponding <code> | |
884 | Datapath_Binding</code> table entry's <var>tunnel_key</var> for the | |
885 | bound OVN logical network. | |
886 | </li> | |
887 | ||
888 | <li> | |
889 | Next, the <code>ovn-controller-vtep</code> will keep reacting to the | |
890 | configuration change in the <code>Port_Binding</code> in the | |
891 | <code>OVN_Northbound</code> database, and updating the | |
892 | <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database. | |
893 | This allows the VTEP gateway to understand where to forward the unicast | |
894 | traffic coming from the extended external network. | |
895 | </li> | |
896 | ||
897 | <li> | |
898 | Eventually, the VTEP gateway's life cycle ends when the administrator | |
899 | unregisters the VTEP gateway from the <code>VTEP</code> database. | |
900 | The <code>ovn-controller-vtep</code> will recognize the event and | |
901 | remove all related configurations (<code>Chassis</code> table entry | |
902 | and port bindings) in the <code>OVN_Southbound</code> database. | |
903 | </li> | |
904 | ||
905 | <li> | |
906 | When the <code>ovn-controller-vtep</code> is terminated, all related | |
907 | configurations in the <code>OVN_Southbound</code> database and | |
908 | the <code>VTEP</code> database will be cleaned, including | |
909 | <code>Chassis</code> table entries for all registered VTEP gateways | |
910 | and their port bindings, and all <code>Ucast_Macs_Remote</code> table | |
911 | entries and the <code>Logical_Switch</code> tunnel keys. | |
912 | </li> | |
913 | </ol> | |
914 | ||
5868eb24 BP |
915 | <h1>Design Decisions</h1> |
916 | ||
917 | <h2>Tunnel Encapsulations</h2> | |
918 | ||
919 | <p> | |
920 | OVN annotates logical network packets that it sends from one hypervisor to | |
921 | another with the following three pieces of metadata, which are encoded in | |
922 | an encapsulation-specific fashion: | |
923 | </p> | |
924 | ||
925 | <ul> | |
926 | <li> | |
927 | 24-bit logical datapath identifier, from the <code>tunnel_key</code> | |
928 | column in the OVN Southbound <code>Datapath_Binding</code> table. | |
929 | </li> | |
930 | ||
931 | <li> | |
932 | 15-bit logical ingress port identifier. ID 0 is reserved for internal | |
933 | use within OVN. IDs 1 through 32767, inclusive, may be assigned to | |
934 | logical ports (see the <code>tunnel_key</code> column in the OVN | |
935 | Southbound <code>Port_Binding</code> table). | |
936 | </li> | |
937 | ||
938 | <li> | |
939 | 16-bit logical egress port identifier. IDs 0 through 32767 have the same | |
940 | meaning as for logical ingress ports. IDs 32768 through 65535, | |
941 | inclusive, may be assigned to logical multicast groups (see the | |
942 | <code>tunnel_key</code> column in the OVN Southbound | |
943 | <code>Multicast_Group</code> table). | |
944 | </li> | |
b705f9ea JP |
945 | </ul> |
946 | ||
947 | <p> | |
5868eb24 BP |
948 | For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT |
949 | encapsulations, for the following reasons: | |
b705f9ea JP |
950 | </p> |
951 | ||
5868eb24 BP |
952 | <ul> |
953 | <li> | |
954 | Only STT and Geneve support the large amounts of metadata (over 32 bits | |
955 | per packet) that OVN uses (as described above). | |
956 | </li> | |
957 | ||
958 | <li> | |
959 | STT and Geneve use randomized UDP or TCP source ports that allows | |
960 | efficient distribution among multiple paths in environments that use ECMP | |
961 | in their underlay. | |
962 | </li> | |
963 | ||
964 | <li> | |
965 | NICs are available to offload STT and Geneve encapsulation and | |
966 | decapsulation. | |
967 | </li> | |
968 | </ul> | |
969 | ||
970 | <p> | |
971 | Due to its flexibility, the preferred encapsulation between hypervisors is | |
972 | Geneve. For Geneve encapsulation, OVN transmits the logical datapath | |
973 | identifier in the Geneve VNI. | |
974 | ||
975 | <!-- Keep the following in sync with ovn/controller/physical.h. --> | |
976 | OVN transmits the logical ingress and logical egress ports in a TLV with | |
977 | class 0xffff, type 0, and a 32-bit value encoded as follows, from MSB to | |
978 | LSB: | |
979 | </p> | |
980 | ||
981 | <diagram> | |
982 | <header name=""> | |
983 | <bits name="rsv" above="1" below="0" width=".25"/> | |
984 | <bits name="ingress port" above="15" width=".75"/> | |
985 | <bits name="egress port" above="16" width=".75"/> | |
986 | </header> | |
987 | </diagram> | |
988 | ||
989 | <p> | |
990 | Environments whose NICs lack Geneve offload may prefer STT encapsulation | |
991 | for performance reasons. For STT encapsulation, OVN encodes all three | |
992 | pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB | |
993 | to LSB: | |
994 | </p> | |
995 | ||
996 | <diagram> | |
997 | <header name=""> | |
998 | <bits name="reserved" above="9" below="0" width=".5"/> | |
999 | <bits name="ingress port" above="15" width=".75"/> | |
1000 | <bits name="egress port" above="16" width=".75"/> | |
1001 | <bits name="datapath" above="24" width="1.25"/> | |
1002 | </header> | |
1003 | </diagram> | |
1004 | ||
b705f9ea | 1005 | <p> |
5868eb24 BP |
1006 | For connecting to gateways, in addition to Geneve and STT, OVN supports |
1007 | VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches. | |
1008 | Currently, gateways have a feature set that matches the capabilities as | |
1009 | defined by the VTEP schema, so fewer bits of metadata are necessary. In | |
1010 | the future, gateways that do not support encapsulations with large amounts | |
1011 | of metadata may continue to have a reduced feature set. | |
b705f9ea | 1012 | </p> |
fe36184b | 1013 | </manpage> |