]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
45 | <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor | |
46 | platform supported by Open vSwitch is acceptable. | |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
56 | switch that supports the <code>vtep</code>(5) schema. (Support for the | |
57 | latter will come later in OVN implementation.) | |
fe36184b BP |
58 | </p> |
59 | ||
60 | <p> | |
fa6aeaeb RB |
61 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
62 | or <dfn>chassis</dfn>. | |
fe36184b BP |
63 | </p> |
64 | </li> | |
65 | </ul> | |
66 | ||
67 | <p> | |
68 | The diagram below shows how the major components of OVN and related | |
69 | software interact. Starting at the top of the diagram, we have: | |
70 | </p> | |
71 | ||
72 | <ul> | |
73 | <li> | |
74 | The Cloud Management System, as defined above. | |
75 | </li> | |
76 | ||
77 | <li> | |
78 | <p> | |
fa6aeaeb RB |
79 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
80 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
81 | The plugin's main purpose is to translate the CMS's notion of logical | |
82 | network configuration, stored in the CMS's configuration database in a | |
83 | CMS-specific format, into an intermediate representation understood by | |
84 | OVN. | |
fe36184b BP |
85 | </p> |
86 | ||
87 | <p> | |
fa6aeaeb RB |
88 | This component is necessarily CMS-specific, so a new plugin needs to be |
89 | developed for each CMS that is integrated with OVN. All of the | |
90 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
91 | </p> |
92 | </li> | |
93 | ||
94 | <li> | |
95 | <p> | |
fa6aeaeb RB |
96 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
97 | representation of logical network configuration passed down by the | |
98 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
99 | matched'' with the concepts used in a CMS, so that it directly supports | |
100 | notions of logical switches, routers, ACLs, and so on. See | |
5868eb24 | 101 | <code>ovn-nb</code>(5) for details. |
fe36184b BP |
102 | </p> |
103 | ||
104 | <p> | |
fa6aeaeb RB |
105 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
106 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
107 | </p> |
108 | </li> | |
109 | ||
110 | <li> | |
91ae2065 RB |
111 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
112 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
113 | logical network configuration in terms of conventional network |
114 | concepts, taken from the OVN Northbound Database, into logical | |
115 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
116 | </li> |
117 | ||
118 | <li> | |
119 | <p> | |
ec78987f | 120 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 121 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 122 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
123 | </p> |
124 | ||
125 | <p> | |
fa6aeaeb RB |
126 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
127 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
128 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
129 | logical network in terms of ``logical datapath flows,'' and | |
130 | <dfn>Binding</dfn> tables that link logical network components' | |
131 | locations to the physical network. The hypervisors populate the PN and | |
dcda6e0d BP |
132 | Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the |
133 | LN tables. | |
fe36184b BP |
134 | </p> |
135 | ||
136 | <p> | |
ec78987f JP |
137 | OVN Southbound Database performance must scale with the number of |
138 | transport nodes. This will likely require some work on | |
139 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
140 | Clustering for availability may be needed. | |
fe36184b BP |
141 | </p> |
142 | </li> | |
143 | </ul> | |
144 | ||
145 | <p> | |
146 | The remaining components are replicated onto each hypervisor: | |
147 | </p> | |
148 | ||
149 | <ul> | |
150 | <li> | |
151 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
152 | software gateway. Northbound, it connects to the OVN Southbound |
153 | Database to learn about OVN configuration and status and to | |
154 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 155 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
156 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
157 | OpenFlow controller, for control over network traffic, and to the | |
158 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
159 | control Open vSwitch configuration. | |
fe36184b BP |
160 | </li> |
161 | ||
162 | <li> | |
163 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
164 | conventional components of Open vSwitch. | |
165 | </li> | |
166 | </ul> | |
167 | ||
168 | <pre fixed="yes"> | |
169 | CMS | |
170 | | | |
171 | | | |
172 | +-----------|-----------+ | |
173 | | | | | |
174 | | OVN/CMS Plugin | | |
175 | | | | | |
176 | | | | | |
177 | | OVN Northbound DB | | |
178 | | | | | |
179 | | | | | |
91ae2065 | 180 | | ovn-northd | |
fe36184b BP |
181 | | | | |
182 | +-----------|-----------+ | |
183 | | | |
184 | | | |
ec78987f JP |
185 | +-------------------+ |
186 | | OVN Southbound DB | | |
187 | +-------------------+ | |
fe36184b BP |
188 | | |
189 | | | |
190 | +------------------+------------------+ | |
191 | | | | | |
ec78987f | 192 | HV 1 | | HV n | |
fe36184b BP |
193 | +---------------|---------------+ . +---------------|---------------+ |
194 | | | | . | | | | |
195 | | ovn-controller | . | ovn-controller | | |
196 | | | | | . | | | | | |
197 | | | | | | | | | | |
198 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
199 | | | | | | |
200 | +-------------------------------+ +-------------------------------+ | |
201 | </pre> | |
202 | ||
ca1564ec BP |
203 | <h2>Chassis Setup</h2> |
204 | ||
205 | <p> | |
206 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
207 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
e43fc07c RB |
208 | System startup scripts may create this bridge prior to starting |
209 | <code>ovn-controller</code> if desired. If this bridge does not exist when | |
210 | ovn-controller starts, it will be created automatically with the default | |
211 | configuration suggested below. The ports on the integration bridge include: | |
ca1564ec BP |
212 | </p> |
213 | ||
214 | <ul> | |
215 | <li> | |
216 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
217 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
218 | these tunnel ports. | |
219 | </li> | |
220 | ||
221 | <li> | |
222 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
223 | The hypervisor itself, or the integration between Open vSwitch and the | |
224 | hypervisor (described in <code>IntegrationGuide.md</code>) takes care of | |
225 | this. (This is not part of OVN or new to OVN; this is pre-existing | |
226 | integration work that has already been done on hypervisors that support | |
227 | OVS.) | |
228 | </li> | |
229 | ||
230 | <li> | |
231 | On a gateway, the physical port used for logical network connectivity. | |
232 | System startup scripts add this port to the bridge prior to starting | |
233 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
234 | instead of a physical port, in more sophisticated setups. | |
235 | </li> | |
236 | </ul> | |
237 | ||
238 | <p> | |
239 | Other ports should not be attached to the integration bridge. In | |
240 | particular, physical ports attached to the underlay network (as opposed to | |
241 | gateway ports, which are physical ports attached to logical networks) must | |
242 | not be attached to the integration bridge. Underlay physical ports should | |
243 | instead be attached to a separate Open vSwitch bridge (they need not be | |
244 | attached to any bridge at all, in fact). | |
245 | </p> | |
246 | ||
247 | <p> | |
a42226f0 BP |
248 | The integration bridge should be configured as described below. |
249 | The effect of each of these settings is documented in | |
250 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
251 | </p> |
252 | ||
e43fc07c RB |
253 | <!-- Keep the following in sync with create_br_int() in |
254 | ovn/controller/ovn-controller.c. --> | |
a42226f0 BP |
255 | <dl> |
256 | <dt><code>fail-mode=secure</code></dt> | |
257 | <dd> | |
258 | Avoids switching packets between isolated logical networks before | |
259 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
260 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
261 | </dd> | |
262 | ||
263 | <dt><code>other-config:disable-in-band=true</code></dt> | |
264 | <dd> | |
265 | Suppresses in-band control flows for the integration bridge. It would be | |
266 | unusual for such flows to show up anyway, because OVN uses a local | |
267 | controller (over a Unix domain socket) instead of a remote controller. | |
268 | It's possible, however, for some other bridge in the same system to have | |
269 | an in-band remote controller, and in that case this suppresses the flows | |
270 | that in-band control would ordinarily set up. See <code>In-Band | |
271 | Control</code> in <code>DESIGN.md</code> for more information. | |
272 | </dd> | |
273 | </dl> | |
274 | ||
ca1564ec BP |
275 | <p> |
276 | The customary name for the integration bridge is <code>br-int</code>, but | |
277 | another name may be used. | |
278 | </p> | |
279 | ||
747b2a45 BP |
280 | <h2>Logical Networks</h2> |
281 | ||
282 | <p> | |
283 | A <dfn>logical network</dfn> implements the same concepts as physical | |
284 | networks, but they are insulated from the physical network with tunnels or | |
285 | other encapsulations. This allows logical networks to have separate IP and | |
286 | other address spaces that overlap, without conflicting, with those used for | |
287 | physical networks. Logical network topologies can be arranged without | |
288 | regard for the topologies of the physical networks on which they run. | |
289 | </p> | |
290 | ||
291 | <p> | |
292 | Logical network concepts in OVN include: | |
293 | </p> | |
294 | ||
295 | <ul> | |
296 | <li> | |
297 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
298 | </li> | |
299 | ||
300 | <li> | |
301 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
302 | switches and routers can be connected into sophisticated topologies. | |
303 | </li> | |
304 | ||
305 | <li> | |
306 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
307 | switch. Logical switches and routers are both implemented as logical | |
308 | datapaths. | |
309 | </li> | |
310 | </ul> | |
311 | ||
ca1564ec | 312 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
313 | |
314 | <p> | |
315 | Tables and their schemas presented in isolation are difficult to | |
316 | understand. Here's an example. | |
317 | </p> | |
318 | ||
9fb4636f GS |
319 | <p> |
320 | A VIF on a hypervisor is a virtual network interface attached either | |
321 | to a VM or a container running directly on that hypervisor (This is | |
322 | different from the interface of a container running inside a VM). | |
323 | </p> | |
324 | ||
fe36184b BP |
325 | <p> |
326 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 327 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
328 | <code>ovn-nb</code>(5), respectively, for the full story on these |
329 | databases. | |
330 | </p> | |
331 | ||
332 | <ol> | |
333 | <li> | |
334 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
335 | using the CMS user interface or API and adds it to a switch (one | |
336 | implemented by OVN as a logical switch). The CMS updates its own | |
337 | configuration. This includes associating unique, persistent identifier | |
338 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
339 | </li> | |
340 | ||
341 | <li> | |
342 | The CMS plugin updates the OVN Northbound database to include the new | |
80f408f4 JP |
343 | VIF, by adding a row to the <code>Logical_Switch_Port</code> |
344 | table. In the new row, <code>name</code> is <var>vif-id</var>, | |
345 | <code>mac</code> is <var>mac</var>, <code>switch</code> points to | |
346 | the OVN logical switch's Logical_Switch record, and other columns | |
347 | are initialized appropriately. | |
fe36184b BP |
348 | </li> |
349 | ||
350 | <li> | |
5868eb24 BP |
351 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
352 | turn, it makes the corresponding updates to the OVN Southbound database, | |
353 | by adding rows to the OVN Southbound database <code>Logical_Flow</code> | |
354 | table to reflect the new port, e.g. add a flow to recognize that packets | |
355 | destined to the new port's MAC address should be delivered to it, and | |
356 | update the flow that delivers broadcast and multicast packets to include | |
357 | the new port. It also creates a record in the <code>Binding</code> table | |
358 | and populates all its columns except the column that identifies the | |
9fb4636f | 359 | <code>chassis</code>. |
fe36184b BP |
360 | </li> |
361 | ||
362 | <li> | |
363 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 364 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
365 | in the previous step. As long as the VM that owns the VIF is powered |
366 | off, <code>ovn-controller</code> cannot do much; it cannot, for example, | |
fe36184b BP |
367 | arrange to send packets to or receive packets from the VIF, because the |
368 | VIF does not actually exist anywhere. | |
369 | </li> | |
370 | ||
371 | <li> | |
372 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
373 | where the VM is powered on, the integration between the hypervisor and | |
374 | Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF | |
375 | to the OVN integration bridge and stores <var>vif-id</var> in | |
376 | <code>external-ids</code>:<code>iface-id</code> to indicate that the | |
377 | interface is an instantiation of the new VIF. (None of this code is new | |
378 | in OVN; this is pre-existing integration work that has already been done | |
379 | on hypervisors that support OVS.) | |
380 | </li> | |
381 | ||
382 | <li> | |
383 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
384 | notices <code>external-ids</code>:<code>iface-id</code> in the new | |
385 | Interface. In response, it updates the local hypervisor's OpenFlow | |
386 | tables so that packets to and from the VIF are properly handled. | |
a0149f47 | 387 | Afterward, in the OVN Southbound DB, it updates the |
e387e3e8 | 388 | <code>Binding</code> table's <code>chassis</code> column for the |
a0149f47 JP |
389 | row that links the logical port from |
390 | <code>external-ids</code>:<code>iface-id</code> to the hypervisor. | |
fe36184b BP |
391 | </li> |
392 | ||
393 | <li> | |
394 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
395 | networking is ready. To support this, <code>ovn-northd</code> notices |
396 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 397 | <code>Binding</code> table and pushes this upward by updating the |
80f408f4 JP |
398 | <ref column="up" table="Logical_Switch_Port" db="OVN_NB"/> column |
399 | in the OVN Northbound database's <ref table="Logical_Switch_Port" | |
400 | db="OVN_NB"/> table to indicate that the VIF is now up. The CMS, | |
401 | if it uses this feature, can then react by allowing the VM's | |
402 | execution to proceed. | |
fe36184b BP |
403 | </li> |
404 | ||
405 | <li> | |
406 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 407 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 408 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
409 | the physical location of the logical port, so each instance updates the |
410 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
5868eb24 BP |
411 | DB <code>Logical_Flow</code> table) so that packets to and from the VIF |
412 | can be properly handled via tunnels. | |
fe36184b BP |
413 | </li> |
414 | ||
415 | <li> | |
416 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 417 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
418 | integration bridge. |
419 | </li> | |
420 | ||
421 | <li> | |
6eceebf5 | 422 | On the hypervisor where the VM was powered off, |
fe36184b | 423 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 424 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 425 | <code>Binding</code> table for the logical port. |
fe36184b BP |
426 | </li> |
427 | ||
428 | <li> | |
9fb4636f | 429 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 430 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
431 | for the logical port. This means that <code>ovn-controller</code> no |
432 | longer knows the physical location of the logical port, so each instance | |
433 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
434 | </li> |
435 | ||
436 | <li> | |
437 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
438 | anyone, an administrator deletes the VIF using the CMS user interface or | |
439 | API. The CMS updates its own configuration. | |
440 | </li> | |
441 | ||
442 | <li> | |
443 | The CMS plugin removes the VIF from the OVN Northbound database, | |
80f408f4 | 444 | by deleting its row in the <code>Logical_Switch_Port</code> table. |
fe36184b BP |
445 | </li> |
446 | ||
447 | <li> | |
91ae2065 | 448 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
449 | updates the OVN Southbound database accordingly, by removing or updating |
450 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
451 | and <code>Binding</code> table that were related to the now-destroyed | |
452 | VIF. | |
fe36184b BP |
453 | </li> |
454 | ||
455 | <li> | |
456 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 457 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
458 | in the previous step. <code>ovn-controller</code> updates OpenFlow |
459 | tables to reflect the update, although there may not be much to do, since | |
460 | the VIF had already become unreachable when it was removed from the | |
e387e3e8 | 461 | <code>Binding</code> table in a previous step. |
fe36184b BP |
462 | </li> |
463 | </ol> | |
464 | ||
a30b56d4 | 465 | <h2>Life Cycle of a Container Interface Inside a VM</h2> |
9fb4636f GS |
466 | |
467 | <p> | |
468 | OVN provides virtual network abstractions by converting information | |
469 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
470 | virtual networking for multi-tenants can only be provided if OVN controller | |
471 | is the only entity that can modify flows in Open vSwitch. When the | |
472 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
473 | fair assumption to make that tenant workloads running inside VMs cannot | |
474 | make any changes to Open vSwitch flows. | |
475 | </p> | |
476 | ||
477 | <p> | |
478 | If the infrastructure provider trusts the applications inside the | |
479 | containers not to break out and modify the Open vSwitch flows, then | |
480 | containers can be run in hypervisors. This is also the case when | |
481 | containers are run inside the VMs and Open vSwitch integration bridge | |
482 | with flows added by OVN controller resides in the same VM. For both | |
483 | the above cases, the workflow is the same as explained with an example | |
484 | in the previous section ("Life Cycle of a VIF"). | |
485 | </p> | |
486 | ||
487 | <p> | |
488 | This section talks about the life cycle of a container interface (CIF) | |
489 | when containers are created in the VMs and the Open vSwitch integration | |
490 | bridge resides inside the hypervisor. In this case, even if a container | |
491 | application breaks out, other tenants are not affected because the | |
492 | containers running inside the VMs cannot modify the flows in the | |
493 | Open vSwitch integration bridge. | |
494 | </p> | |
495 | ||
496 | <p> | |
497 | When multiple containers are created inside a VM, there are multiple | |
498 | CIFs associated with them. The network traffic associated with these | |
499 | CIFs need to reach the Open vSwitch integration bridge running in the | |
500 | hypervisor for OVN to support virtual network abstractions. OVN should | |
501 | also be able to distinguish network traffic coming from different CIFs. | |
502 | There are two ways to distinguish network traffic of CIFs. | |
503 | </p> | |
504 | ||
505 | <p> | |
506 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
507 | there could be a lot of network devices in the hypervisor. This would slow | |
508 | down OVS because of all the additional CPU cycles needed for the management | |
509 | of all the VIFs. It would also mean that the entity creating the | |
510 | containers in a VM should also be able to create the corresponding VIFs in | |
511 | the hypervisor. | |
512 | </p> | |
513 | ||
514 | <p> | |
515 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
516 | OVN could then distinguish network traffic coming from different CIFs via | |
517 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
518 | the tagging mechanism. | |
519 | </p> | |
520 | ||
521 | <ol> | |
522 | <li> | |
523 | A CIF's life cycle begins when a container is spawned inside a VM by | |
524 | the either the same CMS that created the VM or a tenant that owns that VM | |
525 | or even a container Orchestration System that is different than the CMS | |
526 | that initially created the VM. Whoever the entity is, it will need to | |
527 | know the <var>vif-id</var> that is associated with the network interface | |
528 | of the VM through which the container interface's network traffic is | |
529 | expected to go through. The entity that creates the container interface | |
530 | will also need to choose an unused VLAN inside that VM. | |
531 | </li> | |
532 | ||
533 | <li> | |
534 | The container spawning entity (either directly or through the CMS that | |
535 | manages the underlying infrastructure) updates the OVN Northbound | |
536 | database to include the new CIF, by adding a row to the | |
80f408f4 JP |
537 | <code>Logical_Switch_Port</code> table. In the new row, |
538 | <code>name</code> is any unique identifier, | |
539 | <code>parent_name</code> is the <var>vif-id</var> of the VM | |
540 | through which the CIF's network traffic is expected to go through | |
541 | and the <code>tag</code> is the VLAN tag that identifies the | |
9fb4636f GS |
542 | network traffic of that CIF. |
543 | </li> | |
544 | ||
545 | <li> | |
5868eb24 BP |
546 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
547 | turn, it makes the corresponding updates to the OVN Southbound database, | |
548 | by adding rows to the OVN Southbound database's <code>Logical_Flow</code> | |
549 | table to reflect the new port and also by creating a new row in the | |
550 | <code>Binding</code> table and populating all its columns except the | |
551 | column that identifies the <code>chassis</code>. | |
9fb4636f GS |
552 | </li> |
553 | ||
554 | <li> | |
555 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 556 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 557 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 558 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
559 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
560 | has that same value in <var>vif-id</var> in | |
561 | <code>external-ids</code>:<code>iface-id</code> | |
9fb4636f GS |
562 | updates the local hypervisor's OpenFlow tables so that packets to and |
563 | from the VIF with the particular VLAN <code>tag</code> are properly | |
564 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 565 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
566 | </li> |
567 | ||
568 | <li> | |
569 | One can only start the application inside the container after the | |
91ae2065 | 570 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 571 | notices the updated <code>chassis</code> column in <code>Binding</code> |
80f408f4 | 572 | table and updates the <ref column="up" table="Logical_Switch_Port" |
9fb4636f | 573 | db="OVN_NB"/> column in the OVN Northbound database's |
80f408f4 | 574 | <ref table="Logical_Switch_Port" db="OVN_NB"/> table to indicate that the |
9fb4636f GS |
575 | CIF is now up. The entity responsible to start the container application |
576 | queries this value and starts the application. | |
577 | </li> | |
578 | ||
579 | <li> | |
580 | Eventually the entity that created and started the container, stops it. | |
581 | The entity, through the CMS (or directly) deletes its row in the | |
80f408f4 | 582 | <code>Logical_Switch_Port</code> table. |
9fb4636f GS |
583 | </li> |
584 | ||
585 | <li> | |
91ae2065 | 586 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
587 | updates the OVN Southbound database accordingly, by removing or updating |
588 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
589 | that were related to the now-destroyed CIF. It also deletes the row in | |
590 | the <code>Binding</code> table for that CIF. | |
9fb4636f GS |
591 | </li> |
592 | ||
593 | <li> | |
594 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 BP |
595 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
596 | in the previous step. <code>ovn-controller</code> updates OpenFlow | |
597 | tables to reflect the update. | |
9fb4636f GS |
598 | </li> |
599 | </ol> | |
b705f9ea | 600 | |
69a832cf | 601 | <h2>Architectural Physical Life Cycle of a Packet</h2> |
b705f9ea | 602 | |
b705f9ea | 603 | <p> |
5868eb24 BP |
604 | This section describes how a packet travels from one virtual machine or |
605 | container to another through OVN. This description focuses on the physical | |
606 | treatment of a packet; for a description of the logical life cycle of a | |
607 | packet, please refer to the <code>Logical_Flow</code> table in | |
608 | <code>ovn-sb</code>(5). | |
b705f9ea JP |
609 | </p> |
610 | ||
5868eb24 BP |
611 | <p> |
612 | This section mentions several data and metadata fields, for clarity | |
613 | summarized here: | |
614 | </p> | |
615 | ||
616 | <dl> | |
617 | <dt>tunnel key</dt> | |
618 | <dd> | |
619 | When OVN encapsulates a packet in Geneve or another tunnel, it attaches | |
620 | extra data to it to allow the receiving OVN instance to process it | |
621 | correctly. This takes different forms depending on the particular | |
622 | encapsulation, but in each case we refer to it here as the ``tunnel | |
623 | key.'' See <code>Tunnel Encapsulations</code>, below, for details. | |
624 | </dd> | |
625 | ||
626 | <dt>logical datapath field</dt> | |
627 | <dd> | |
628 | A field that denotes the logical datapath through which a packet is being | |
4103f6d2 BP |
629 | processed. |
630 | <!-- Keep the following in sync with MFF_LOG_DATAPATH in | |
667e2b0b | 631 | ovn/lib/logical-fields.h. --> |
4103f6d2 BP |
632 | OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls |
633 | ``metadata'' to store the logical datapath. (This field is passed across | |
634 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
635 | </dd> |
636 | ||
637 | <dt>logical input port field</dt> | |
638 | <dd> | |
37910994 JP |
639 | <p> |
640 | A field that denotes the logical port from which the packet | |
641 | entered the logical datapath. | |
642 | <!-- Keep the following in sync with MFF_LOG_INPORT in | |
667e2b0b | 643 | ovn/lib/logical-fields.h. --> |
37910994 JP |
644 | OVN stores this in Nicira extension register number 6. |
645 | </p> | |
646 | ||
647 | <p> | |
648 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
649 | Although VXLAN tunnels do not explicitly carry a logical input port, | |
650 | OVN only uses VXLAN to communicate with gateways that from OVN's | |
651 | perspective consist of only a single logical port, so that OVN can set | |
652 | the logical input port field to this one on ingress to the OVN logical | |
653 | pipeline. | |
654 | </p> | |
5868eb24 BP |
655 | </dd> |
656 | ||
657 | <dt>logical output port field</dt> | |
658 | <dd> | |
37910994 JP |
659 | <p> |
660 | A field that denotes the logical port from which the packet will | |
661 | leave the logical datapath. This is initialized to 0 at the | |
662 | beginning of the logical ingress pipeline. | |
663 | <!-- Keep the following in sync with MFF_LOG_OUTPORT in | |
667e2b0b | 664 | ovn/lib/logical-fields.h. --> |
37910994 JP |
665 | OVN stores this in Nicira extension register number 7. |
666 | </p> | |
667 | ||
668 | <p> | |
669 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
670 | VXLAN tunnels do not transmit the logical output port field. | |
671 | </p> | |
5868eb24 BP |
672 | </dd> |
673 | ||
3bd4ae23 | 674 | <dt>conntrack zone field for logical ports</dt> |
78aab811 | 675 | <dd> |
3bd4ae23 GS |
676 | A field that denotes the connection tracking zone for logical ports. |
677 | The value only has local significance and is not meaningful between | |
678 | chassis. This is initialized to 0 at the beginning of the logical | |
679 | ingress pipeline. OVN stores this in Nicira extension register number 5. | |
680 | </dd> | |
681 | ||
682 | <dt>conntrack zone fields for Gateway router</dt> | |
683 | <dd> | |
684 | Fields that denote the connection tracking zones for Gateway routers. | |
685 | These values only have local significance (only on chassis that have | |
686 | Gateway routers instantiated) and is not meaningful between | |
687 | chassis. OVN stores the zone information for DNATting in Nicira | |
688 | extension register number 3 and zone information for SNATing in Nicira | |
689 | extension register number 4. | |
78aab811 JP |
690 | </dd> |
691 | ||
5868eb24 BP |
692 | <dt>VLAN ID</dt> |
693 | <dd> | |
694 | The VLAN ID is used as an interface between OVN and containers nested | |
695 | inside a VM (see <code>Life Cycle of a container interface inside a | |
696 | VM</code>, above, for more information). | |
697 | </dd> | |
698 | </dl> | |
699 | ||
700 | <p> | |
701 | Initially, a VM or container on the ingress hypervisor sends a packet on a | |
702 | port attached to the OVN integration bridge. Then: | |
703 | </p> | |
704 | ||
705 | <ol> | |
b705f9ea JP |
706 | <li> |
707 | <p> | |
5868eb24 BP |
708 | OpenFlow table 0 performs physical-to-logical translation. It matches |
709 | the packet's ingress port. Its actions annotate the packet with | |
710 | logical metadata, by setting the logical datapath field to identify the | |
711 | logical datapath that the packet is traversing and the logical input | |
712 | port field to identify the ingress port. Then it resubmits to table 16 | |
713 | to enter the logical ingress pipeline. | |
714 | </p> | |
715 | ||
716 | <p> | |
717 | Packets that originate from a container nested within a VM are treated | |
718 | in a slightly different way. The originating container can be | |
719 | distinguished based on the VIF-specific VLAN ID, so the | |
720 | physical-to-logical translation flows additionally match on VLAN ID and | |
721 | the actions strip the VLAN header. Following this step, OVN treats | |
722 | packets from containers just like any other packets. | |
723 | </p> | |
724 | ||
725 | <p> | |
726 | Table 0 also processes packets that arrive from other chassis. It | |
727 | distinguishes them from other packets by ingress port, which is a | |
728 | tunnel. As with packets just entering the OVN pipeline, the actions | |
729 | annotate these packets with logical datapath and logical ingress port | |
730 | metadata. In addition, the actions set the logical output port field, | |
731 | which is available because in OVN tunneling occurs after the logical | |
732 | output port is known. These three pieces of information are obtained | |
733 | from the tunnel encapsulation metadata (see <code>Tunnel | |
734 | Encapsulations</code> for encoding details). Then the actions resubmit | |
735 | to table 33 to enter the logical egress pipeline. | |
b705f9ea JP |
736 | </p> |
737 | </li> | |
738 | ||
739 | <li> | |
740 | <p> | |
5868eb24 BP |
741 | OpenFlow tables 16 through 31 execute the logical ingress pipeline from |
742 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
743 | These tables are expressed entirely in terms of logical concepts like | |
744 | logical ports and logical datapaths. A big part of | |
745 | <code>ovn-controller</code>'s job is to translate them into equivalent | |
746 | OpenFlow (in particular it translates the table numbers: | |
747 | <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16 | |
0bac7164 | 748 | through 31). |
b705f9ea | 749 | </p> |
5868eb24 | 750 | |
0bac7164 BP |
751 | <p> |
752 | Most OVN actions have fairly obvious implementations in OpenFlow (with | |
753 | OVS extensions), e.g. <code>next;</code> is implemented as | |
754 | <code>resubmit</code>, <code><var>field</var> = | |
755 | <var>constant</var>;</code> as <code>set_field</code>. A few are worth | |
756 | describing in more detail: | |
757 | </p> | |
758 | ||
759 | <dl> | |
760 | <dt><code>output:</code></dt> | |
761 | <dd> | |
762 | Implemented by resubmitting the packet to table 32. If the pipeline | |
763 | executes more than one <code>output</code> action, then each one is | |
764 | separately resubmitted to table 32. This can be used to send | |
765 | multiple copies of the packet to multiple ports. (If the packet was | |
766 | not modified between the <code>output</code> actions, and some of the | |
767 | copies are destined to the same hypervisor, then using a logical | |
768 | multicast output port would save bandwidth between hypervisors.) | |
769 | </dd> | |
770 | ||
771 | <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt> | |
772 | <dd> | |
773 | <p> | |
774 | Implemented by storing arguments into OpenFlow fields, then | |
775 | resubmitting to table 65, which <code>ovn-controller</code> | |
776 | populates with flows generated from the <code>MAC_Binding</code> | |
777 | table in the OVN Southbound database. If there is a match in table | |
778 | 65, then its actions store the bound MAC in the Ethernet | |
779 | destination address field. | |
780 | </p> | |
781 | ||
782 | <p> | |
783 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
784 | the arguments, so that the OVN actions do not have to be aware of | |
785 | this temporary use.) | |
786 | </p> | |
787 | </dd> | |
788 | ||
789 | <dt><code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt> | |
790 | <dd> | |
791 | <p> | |
792 | Implemented by storing the arguments into OpenFlow fields, then | |
793 | outputting a packet to <code>ovn-controller</code>, which updates | |
794 | the <code>MAC_Binding</code> table. | |
795 | </p> | |
796 | ||
797 | <p> | |
798 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
799 | the arguments, so that the OVN actions do not have to be aware of | |
800 | this temporary use.) | |
801 | </p> | |
802 | </dd> | |
803 | </dl> | |
b705f9ea JP |
804 | </li> |
805 | ||
806 | <li> | |
807 | <p> | |
5868eb24 BP |
808 | OpenFlow tables 32 through 47 implement the <code>output</code> action |
809 | in the logical ingress pipeline. Specifically, table 32 handles | |
810 | packets to remote hypervisors, table 33 handles packets to the local | |
811 | hypervisor, and table 34 discards packets whose logical ingress and | |
812 | egress port are the same. | |
813 | </p> | |
814 | ||
0b7da177 BP |
815 | <p> |
816 | Logical patch ports are a special case. Logical patch ports do not | |
817 | have a physical location and effectively reside on every hypervisor. | |
818 | Thus, flow table 33, for output to ports on the local hypervisor, | |
819 | naturally implements output to unicast logical patch ports too. | |
820 | However, applying the same logic to a logical patch port that is part | |
821 | of a logical multicast group yields packet duplication, because each | |
822 | hypervisor that contains a logical port in the multicast group will | |
823 | also output the packet to the logical patch port. Thus, multicast | |
824 | groups implement output to logical patch ports in table 32. | |
825 | </p> | |
826 | ||
5868eb24 BP |
827 | <p> |
828 | Each flow in table 32 matches on a logical output port for unicast or | |
829 | multicast logical ports that include a logical port on a remote | |
830 | hypervisor. Each flow's actions implement sending a packet to the port | |
831 | it matches. For unicast logical output ports on remote hypervisors, | |
832 | the actions set the tunnel key to the correct value, then send the | |
833 | packet on the tunnel port to the correct hypervisor. (When the remote | |
834 | hypervisor receives the packet, table 0 there will recognize it as a | |
835 | tunneled packet and pass it along to table 33.) For multicast logical | |
836 | output ports, the actions send one copy of the packet to each remote | |
837 | hypervisor, in the same way as for unicast destinations. If a | |
838 | multicast group includes a logical port or ports on the local | |
839 | hypervisor, then its actions also resubmit to table 33. Table 32 also | |
840 | includes a fallback flow that resubmits to table 33 if there is no | |
841 | other match. | |
842 | </p> | |
843 | ||
844 | <p> | |
845 | Flows in table 33 resemble those in table 32 but for logical ports that | |
0b7da177 | 846 | reside locally rather than remotely. For unicast logical output ports |
5868eb24 BP |
847 | on the local hypervisor, the actions just resubmit to table 34. For |
848 | multicast output ports that include one or more logical ports on the | |
849 | local hypervisor, for each such logical port <var>P</var>, the actions | |
850 | change the logical output port to <var>P</var>, then resubmit to table | |
851 | 34. | |
852 | </p> | |
853 | ||
6e6c3f91 HZ |
854 | <p> |
855 | A special case is that when a localnet port exists on the datapath, | |
856 | remote port is connected by switching to the localnet port. In this | |
857 | case, instead of adding a flow in table 32 to reach the remote port, a | |
858 | flow is added in table 33 to switch the logical outport to the localnet | |
859 | port, and resubmit to table 33 as if it were unicasted to a logical | |
860 | port on the local hypervisor. | |
861 | </p> | |
862 | ||
5868eb24 BP |
863 | <p> |
864 | Table 34 matches and drops packets for which the logical input and | |
865 | output ports are the same. It resubmits other packets to table 48. | |
b705f9ea JP |
866 | </p> |
867 | </li> | |
5868eb24 BP |
868 | |
869 | <li> | |
870 | <p> | |
871 | OpenFlow tables 48 through 63 execute the logical egress pipeline from | |
872 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
873 | The egress pipeline can perform a final stage of validation before | |
874 | packet delivery. Eventually, it may execute an <code>output</code> | |
875 | action, which <code>ovn-controller</code> implements by resubmitting to | |
876 | table 64. A packet for which the pipeline never executes | |
877 | <code>output</code> is effectively dropped (although it may have been | |
878 | transmitted through a tunnel across a physical network). | |
879 | </p> | |
880 | ||
881 | <p> | |
882 | The egress pipeline cannot change the logical output port or cause | |
883 | further tunneling. | |
884 | </p> | |
885 | </li> | |
886 | ||
887 | <li> | |
888 | <p> | |
889 | OpenFlow table 64 performs logical-to-physical translation, the | |
890 | opposite of table 0. It matches the packet's logical egress port. Its | |
891 | actions output the packet to the port attached to the OVN integration | |
892 | bridge that represents that logical port. If the logical egress port | |
893 | is a container nested with a VM, then before sending the packet the | |
894 | actions push on a VLAN header with an appropriate VLAN ID. | |
895 | </p> | |
d387d24d BP |
896 | |
897 | <p> | |
898 | If the logical egress port is a logical patch port, then table 64 | |
899 | outputs to an OVS patch port that represents the logical patch port. | |
900 | The packet re-enters the OpenFlow flow table from the OVS patch port's | |
901 | peer in table 0, which identifies the logical datapath and logical | |
902 | input port based on the OVS patch port's OpenFlow port number. | |
903 | </p> | |
5868eb24 BP |
904 | </li> |
905 | </ol> | |
906 | ||
88058f19 AW |
907 | <h2>Life Cycle of a VTEP gateway</h2> |
908 | ||
909 | <p> | |
910 | A gateway is a chassis that forwards traffic between the OVN-managed | |
911 | part of a logical network and a physical VLAN, extending a | |
912 | tunnel-based logical network into a physical network. | |
913 | </p> | |
914 | ||
915 | <p> | |
916 | The steps below refer often to details of the OVN and VTEP database | |
917 | schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5) | |
918 | and <code>vtep</code>(5), respectively, for the full story on these | |
919 | databases. | |
920 | </p> | |
921 | ||
922 | <ol> | |
923 | <li> | |
924 | A VTEP gateway's life cycle begins with the administrator registering | |
925 | the VTEP gateway as a <code>Physical_Switch</code> table entry in the | |
926 | <code>VTEP</code> database. The <code>ovn-controller-vtep</code> | |
927 | connected to this VTEP database, will recognize the new VTEP gateway | |
928 | and create a new <code>Chassis</code> table entry for it in the | |
929 | <code>OVN_Southbound</code> database. | |
930 | </li> | |
931 | ||
932 | <li> | |
933 | The administrator can then create a new <code>Logical_Switch</code> | |
934 | table entry, and bind a particular vlan on a VTEP gateway's port to | |
935 | any VTEP logical switch. Once a VTEP logical switch is bound to | |
936 | a VTEP gateway, the <code>ovn-controller-vtep</code> will detect | |
937 | it and add its name to the <var>vtep_logical_switches</var> | |
938 | column of the <code>Chassis</code> table in the <code> | |
939 | OVN_Southbound</code> database. Note, the <var>tunnel_key</var> | |
940 | column of VTEP logical switch is not filled at creation. The | |
941 | <code>ovn-controller-vtep</code> will set the column when the | |
942 | correponding vtep logical switch is bound to an OVN logical network. | |
943 | </li> | |
944 | ||
945 | <li> | |
946 | Now, the administrator can use the CMS to add a VTEP logical switch | |
947 | to the OVN logical network. To do that, the CMS must first create a | |
80f408f4 | 948 | new <code>Logical_Switch_Port</code> table entry in the <code> |
88058f19 AW |
949 | OVN_Northbound</code> database. Then, the <var>type</var> column |
950 | of this entry must be set to "vtep". Next, the <var> | |
951 | vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys | |
952 | in the <var>options</var> column must also be specified, since | |
953 | multiple VTEP gateways can attach to the same VTEP logical switch. | |
954 | </li> | |
955 | ||
956 | <li> | |
957 | The newly created logical port in the <code>OVN_Northbound</code> | |
958 | database and its configuration will be passed down to the <code> | |
959 | OVN_Southbound</code> database as a new <code>Port_Binding</code> | |
960 | table entry. The <code>ovn-controller-vtep</code> will recognize the | |
961 | change and bind the logical port to the corresponding VTEP gateway | |
962 | chassis. Configuration of binding the same VTEP logical switch to | |
963 | a different OVN logical networks is not allowed and a warning will be | |
964 | generated in the log. | |
965 | </li> | |
966 | ||
967 | <li> | |
968 | Beside binding to the VTEP gateway chassis, the <code> | |
969 | ovn-controller-vtep</code> will update the <var>tunnel_key</var> | |
970 | column of the VTEP logical switch to the corresponding <code> | |
971 | Datapath_Binding</code> table entry's <var>tunnel_key</var> for the | |
972 | bound OVN logical network. | |
973 | </li> | |
974 | ||
975 | <li> | |
976 | Next, the <code>ovn-controller-vtep</code> will keep reacting to the | |
977 | configuration change in the <code>Port_Binding</code> in the | |
978 | <code>OVN_Northbound</code> database, and updating the | |
979 | <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database. | |
980 | This allows the VTEP gateway to understand where to forward the unicast | |
981 | traffic coming from the extended external network. | |
982 | </li> | |
983 | ||
984 | <li> | |
985 | Eventually, the VTEP gateway's life cycle ends when the administrator | |
986 | unregisters the VTEP gateway from the <code>VTEP</code> database. | |
987 | The <code>ovn-controller-vtep</code> will recognize the event and | |
988 | remove all related configurations (<code>Chassis</code> table entry | |
989 | and port bindings) in the <code>OVN_Southbound</code> database. | |
990 | </li> | |
991 | ||
992 | <li> | |
993 | When the <code>ovn-controller-vtep</code> is terminated, all related | |
994 | configurations in the <code>OVN_Southbound</code> database and | |
995 | the <code>VTEP</code> database will be cleaned, including | |
996 | <code>Chassis</code> table entries for all registered VTEP gateways | |
997 | and their port bindings, and all <code>Ucast_Macs_Remote</code> table | |
998 | entries and the <code>Logical_Switch</code> tunnel keys. | |
999 | </li> | |
1000 | </ol> | |
1001 | ||
5868eb24 BP |
1002 | <h1>Design Decisions</h1> |
1003 | ||
1004 | <h2>Tunnel Encapsulations</h2> | |
1005 | ||
1006 | <p> | |
1007 | OVN annotates logical network packets that it sends from one hypervisor to | |
1008 | another with the following three pieces of metadata, which are encoded in | |
1009 | an encapsulation-specific fashion: | |
1010 | </p> | |
1011 | ||
1012 | <ul> | |
1013 | <li> | |
1014 | 24-bit logical datapath identifier, from the <code>tunnel_key</code> | |
1015 | column in the OVN Southbound <code>Datapath_Binding</code> table. | |
1016 | </li> | |
1017 | ||
1018 | <li> | |
1019 | 15-bit logical ingress port identifier. ID 0 is reserved for internal | |
1020 | use within OVN. IDs 1 through 32767, inclusive, may be assigned to | |
1021 | logical ports (see the <code>tunnel_key</code> column in the OVN | |
1022 | Southbound <code>Port_Binding</code> table). | |
1023 | </li> | |
1024 | ||
1025 | <li> | |
1026 | 16-bit logical egress port identifier. IDs 0 through 32767 have the same | |
1027 | meaning as for logical ingress ports. IDs 32768 through 65535, | |
1028 | inclusive, may be assigned to logical multicast groups (see the | |
1029 | <code>tunnel_key</code> column in the OVN Southbound | |
1030 | <code>Multicast_Group</code> table). | |
1031 | </li> | |
b705f9ea JP |
1032 | </ul> |
1033 | ||
1034 | <p> | |
5868eb24 BP |
1035 | For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT |
1036 | encapsulations, for the following reasons: | |
b705f9ea JP |
1037 | </p> |
1038 | ||
5868eb24 BP |
1039 | <ul> |
1040 | <li> | |
1041 | Only STT and Geneve support the large amounts of metadata (over 32 bits | |
1042 | per packet) that OVN uses (as described above). | |
1043 | </li> | |
1044 | ||
1045 | <li> | |
1046 | STT and Geneve use randomized UDP or TCP source ports that allows | |
1047 | efficient distribution among multiple paths in environments that use ECMP | |
1048 | in their underlay. | |
1049 | </li> | |
1050 | ||
1051 | <li> | |
1052 | NICs are available to offload STT and Geneve encapsulation and | |
1053 | decapsulation. | |
1054 | </li> | |
1055 | </ul> | |
1056 | ||
1057 | <p> | |
1058 | Due to its flexibility, the preferred encapsulation between hypervisors is | |
1059 | Geneve. For Geneve encapsulation, OVN transmits the logical datapath | |
1060 | identifier in the Geneve VNI. | |
1061 | ||
1062 | <!-- Keep the following in sync with ovn/controller/physical.h. --> | |
1063 | OVN transmits the logical ingress and logical egress ports in a TLV with | |
57d44532 | 1064 | class 0x0102, type 0, and a 32-bit value encoded as follows, from MSB to |
5868eb24 BP |
1065 | LSB: |
1066 | </p> | |
1067 | ||
1068 | <diagram> | |
1069 | <header name=""> | |
1070 | <bits name="rsv" above="1" below="0" width=".25"/> | |
1071 | <bits name="ingress port" above="15" width=".75"/> | |
1072 | <bits name="egress port" above="16" width=".75"/> | |
1073 | </header> | |
1074 | </diagram> | |
1075 | ||
1076 | <p> | |
1077 | Environments whose NICs lack Geneve offload may prefer STT encapsulation | |
1078 | for performance reasons. For STT encapsulation, OVN encodes all three | |
1079 | pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB | |
1080 | to LSB: | |
1081 | </p> | |
1082 | ||
1083 | <diagram> | |
1084 | <header name=""> | |
1085 | <bits name="reserved" above="9" below="0" width=".5"/> | |
1086 | <bits name="ingress port" above="15" width=".75"/> | |
1087 | <bits name="egress port" above="16" width=".75"/> | |
1088 | <bits name="datapath" above="24" width="1.25"/> | |
1089 | </header> | |
1090 | </diagram> | |
1091 | ||
b705f9ea | 1092 | <p> |
5868eb24 BP |
1093 | For connecting to gateways, in addition to Geneve and STT, OVN supports |
1094 | VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches. | |
1095 | Currently, gateways have a feature set that matches the capabilities as | |
1096 | defined by the VTEP schema, so fewer bits of metadata are necessary. In | |
1097 | the future, gateways that do not support encapsulations with large amounts | |
1098 | of metadata may continue to have a reduced feature set. | |
b705f9ea | 1099 | </p> |
fe36184b | 1100 | </manpage> |