]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
45 | <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor | |
46 | platform supported by Open vSwitch is acceptable. | |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
56 | switch that supports the <code>vtep</code>(5) schema. (Support for the | |
57 | latter will come later in OVN implementation.) | |
fe36184b BP |
58 | </p> |
59 | ||
60 | <p> | |
fa6aeaeb RB |
61 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
62 | or <dfn>chassis</dfn>. | |
fe36184b BP |
63 | </p> |
64 | </li> | |
65 | </ul> | |
66 | ||
67 | <p> | |
68 | The diagram below shows how the major components of OVN and related | |
69 | software interact. Starting at the top of the diagram, we have: | |
70 | </p> | |
71 | ||
72 | <ul> | |
73 | <li> | |
74 | The Cloud Management System, as defined above. | |
75 | </li> | |
76 | ||
77 | <li> | |
78 | <p> | |
fa6aeaeb RB |
79 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
80 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
81 | The plugin's main purpose is to translate the CMS's notion of logical | |
82 | network configuration, stored in the CMS's configuration database in a | |
83 | CMS-specific format, into an intermediate representation understood by | |
84 | OVN. | |
fe36184b BP |
85 | </p> |
86 | ||
87 | <p> | |
fa6aeaeb RB |
88 | This component is necessarily CMS-specific, so a new plugin needs to be |
89 | developed for each CMS that is integrated with OVN. All of the | |
90 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
91 | </p> |
92 | </li> | |
93 | ||
94 | <li> | |
95 | <p> | |
fa6aeaeb RB |
96 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
97 | representation of logical network configuration passed down by the | |
98 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
99 | matched'' with the concepts used in a CMS, so that it directly supports | |
100 | notions of logical switches, routers, ACLs, and so on. See | |
5868eb24 | 101 | <code>ovn-nb</code>(5) for details. |
fe36184b BP |
102 | </p> |
103 | ||
104 | <p> | |
fa6aeaeb RB |
105 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
106 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
107 | </p> |
108 | </li> | |
109 | ||
110 | <li> | |
91ae2065 RB |
111 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
112 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
113 | logical network configuration in terms of conventional network |
114 | concepts, taken from the OVN Northbound Database, into logical | |
115 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
116 | </li> |
117 | ||
118 | <li> | |
119 | <p> | |
ec78987f | 120 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 121 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 122 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
123 | </p> |
124 | ||
125 | <p> | |
fa6aeaeb RB |
126 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
127 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
128 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
129 | logical network in terms of ``logical datapath flows,'' and | |
130 | <dfn>Binding</dfn> tables that link logical network components' | |
131 | locations to the physical network. The hypervisors populate the PN and | |
dcda6e0d BP |
132 | Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the |
133 | LN tables. | |
fe36184b BP |
134 | </p> |
135 | ||
136 | <p> | |
ec78987f JP |
137 | OVN Southbound Database performance must scale with the number of |
138 | transport nodes. This will likely require some work on | |
139 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
140 | Clustering for availability may be needed. | |
fe36184b BP |
141 | </p> |
142 | </li> | |
143 | </ul> | |
144 | ||
145 | <p> | |
146 | The remaining components are replicated onto each hypervisor: | |
147 | </p> | |
148 | ||
149 | <ul> | |
150 | <li> | |
151 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
152 | software gateway. Northbound, it connects to the OVN Southbound |
153 | Database to learn about OVN configuration and status and to | |
154 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 155 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
156 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
157 | OpenFlow controller, for control over network traffic, and to the | |
158 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
159 | control Open vSwitch configuration. | |
fe36184b BP |
160 | </li> |
161 | ||
162 | <li> | |
163 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
164 | conventional components of Open vSwitch. | |
165 | </li> | |
166 | </ul> | |
167 | ||
168 | <pre fixed="yes"> | |
169 | CMS | |
170 | | | |
171 | | | |
172 | +-----------|-----------+ | |
173 | | | | | |
174 | | OVN/CMS Plugin | | |
175 | | | | | |
176 | | | | | |
177 | | OVN Northbound DB | | |
178 | | | | | |
179 | | | | | |
91ae2065 | 180 | | ovn-northd | |
fe36184b BP |
181 | | | | |
182 | +-----------|-----------+ | |
183 | | | |
184 | | | |
ec78987f JP |
185 | +-------------------+ |
186 | | OVN Southbound DB | | |
187 | +-------------------+ | |
fe36184b BP |
188 | | |
189 | | | |
190 | +------------------+------------------+ | |
191 | | | | | |
ec78987f | 192 | HV 1 | | HV n | |
fe36184b BP |
193 | +---------------|---------------+ . +---------------|---------------+ |
194 | | | | . | | | | |
195 | | ovn-controller | . | ovn-controller | | |
196 | | | | | . | | | | | |
197 | | | | | | | | | | |
198 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
199 | | | | | | |
200 | +-------------------------------+ +-------------------------------+ | |
201 | </pre> | |
202 | ||
fa183acc BP |
203 | <h2>Information Flow in OVN</h2> |
204 | ||
205 | <p> | |
206 | Configuration data in OVN flows from north to south. The CMS, through its | |
207 | OVN/CMS plugin, passes the logical network configuration to | |
208 | <code>ovn-northd</code> via the northbound database. In turn, | |
209 | <code>ovn-northd</code> compiles the configuration into a lower-level form | |
210 | and passes it to all of the chassis via the southbound database. | |
211 | </p> | |
212 | ||
213 | <p> | |
214 | Status information in OVN flows from south to north. OVN currently | |
215 | provides only a few forms of status information. First, | |
216 | <code>ovn-northd</code> populates the <code>up</code> column in the | |
217 | northbound <code>Logical_Switch_Port</code> table: if a logical port's | |
218 | <code>chassis</code> column in the southbound <code>Port_Binding</code> | |
219 | table is nonempty, it sets <code>up</code> to <code>true</code>, otherwise | |
220 | to <code>false</code>. This allows the CMS to detect when a VM's | |
221 | networking has come up. | |
222 | </p> | |
223 | ||
224 | <p> | |
225 | Second, OVN provides feedback to the CMS on the realization of its | |
226 | configuration, that is, whether the configuration provided by the CMS has | |
227 | taken effect. This feature requires the CMS to participate in a sequence | |
228 | number protocol, which works the following way: | |
229 | </p> | |
230 | ||
231 | <ol> | |
232 | <li> | |
233 | When the CMS updates the configuration in the northbound database, as | |
234 | part of the same transaction, it increments the value of the | |
235 | <code>nb_cfg</code> column in the <code>NB_Global</code> table. (This is | |
236 | only necessary if the CMS wants to know when the configuration has been | |
237 | realized.) | |
238 | </li> | |
239 | ||
240 | <li> | |
241 | When <code>ovn-northd</code> updates the southbound database based on a | |
242 | given snapshot of the northbound database, it copies <code>nb_cfg</code> | |
243 | from northbound <code>NB_Global</code> into the southbound database | |
244 | <code>SB_Global</code> table, as part of the same transaction. (Thus, an | |
245 | observer monitoring both databases can determine when the southbound | |
246 | database is caught up with the northbound.) | |
247 | </li> | |
248 | ||
249 | <li> | |
250 | After <code>ovn-northd</code> receives confirmation from the southbound | |
251 | database server that its changes have committed, it updates | |
252 | <code>sb_cfg</code> in the northbound <code>NB_Global</code> table to the | |
253 | <code>nb_cfg</code> version that was pushed down. (Thus, the CMS or | |
254 | another observer can determine when the southbound database is caught up | |
255 | without a connection to the southbound database.) | |
256 | </li> | |
257 | ||
258 | <li> | |
259 | The <code>ovn-controller</code> process on each chassis receives the | |
260 | updated southbound database, with the updated <code>nb_cfg</code>. This | |
261 | process in turn updates the physical flows installed in the chassis's | |
262 | Open vSwitch instances. When it receives confirmation from Open vSwitch | |
263 | that the physical flows have been updated, it updates <code>nb_cfg</code> | |
264 | in its own <code>Chassis</code> record in the southbound database. | |
265 | </li> | |
266 | ||
267 | <li> | |
268 | <code>ovn-northd</code> monitors the <code>nb_cfg</code> column in all of | |
269 | the <code>Chassis</code> records in the southbound database. It keeps | |
270 | track of the minimum value among all the records and copies it into the | |
271 | <code>hv_cfg</code> column in the northbound <code>NB_Global</code> | |
272 | table. (Thus, the CMS or another observer can determine when all of the | |
273 | hypervisors have caught up to the northbound configuration.) | |
274 | </li> | |
275 | </ol> | |
276 | ||
ca1564ec BP |
277 | <h2>Chassis Setup</h2> |
278 | ||
279 | <p> | |
280 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
281 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
e43fc07c RB |
282 | System startup scripts may create this bridge prior to starting |
283 | <code>ovn-controller</code> if desired. If this bridge does not exist when | |
284 | ovn-controller starts, it will be created automatically with the default | |
285 | configuration suggested below. The ports on the integration bridge include: | |
ca1564ec BP |
286 | </p> |
287 | ||
288 | <ul> | |
289 | <li> | |
290 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
291 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
292 | these tunnel ports. | |
293 | </li> | |
294 | ||
295 | <li> | |
296 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
297 | The hypervisor itself, or the integration between Open vSwitch and the | |
298 | hypervisor (described in <code>IntegrationGuide.md</code>) takes care of | |
299 | this. (This is not part of OVN or new to OVN; this is pre-existing | |
300 | integration work that has already been done on hypervisors that support | |
301 | OVS.) | |
302 | </li> | |
303 | ||
304 | <li> | |
305 | On a gateway, the physical port used for logical network connectivity. | |
306 | System startup scripts add this port to the bridge prior to starting | |
307 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
308 | instead of a physical port, in more sophisticated setups. | |
309 | </li> | |
310 | </ul> | |
311 | ||
312 | <p> | |
313 | Other ports should not be attached to the integration bridge. In | |
314 | particular, physical ports attached to the underlay network (as opposed to | |
315 | gateway ports, which are physical ports attached to logical networks) must | |
316 | not be attached to the integration bridge. Underlay physical ports should | |
317 | instead be attached to a separate Open vSwitch bridge (they need not be | |
318 | attached to any bridge at all, in fact). | |
319 | </p> | |
320 | ||
321 | <p> | |
a42226f0 BP |
322 | The integration bridge should be configured as described below. |
323 | The effect of each of these settings is documented in | |
324 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
325 | </p> |
326 | ||
e43fc07c RB |
327 | <!-- Keep the following in sync with create_br_int() in |
328 | ovn/controller/ovn-controller.c. --> | |
a42226f0 BP |
329 | <dl> |
330 | <dt><code>fail-mode=secure</code></dt> | |
331 | <dd> | |
332 | Avoids switching packets between isolated logical networks before | |
333 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
334 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
335 | </dd> | |
336 | ||
337 | <dt><code>other-config:disable-in-band=true</code></dt> | |
338 | <dd> | |
339 | Suppresses in-band control flows for the integration bridge. It would be | |
340 | unusual for such flows to show up anyway, because OVN uses a local | |
341 | controller (over a Unix domain socket) instead of a remote controller. | |
342 | It's possible, however, for some other bridge in the same system to have | |
343 | an in-band remote controller, and in that case this suppresses the flows | |
344 | that in-band control would ordinarily set up. See <code>In-Band | |
345 | Control</code> in <code>DESIGN.md</code> for more information. | |
346 | </dd> | |
347 | </dl> | |
348 | ||
ca1564ec BP |
349 | <p> |
350 | The customary name for the integration bridge is <code>br-int</code>, but | |
351 | another name may be used. | |
352 | </p> | |
353 | ||
747b2a45 BP |
354 | <h2>Logical Networks</h2> |
355 | ||
356 | <p> | |
357 | A <dfn>logical network</dfn> implements the same concepts as physical | |
358 | networks, but they are insulated from the physical network with tunnels or | |
359 | other encapsulations. This allows logical networks to have separate IP and | |
360 | other address spaces that overlap, without conflicting, with those used for | |
361 | physical networks. Logical network topologies can be arranged without | |
362 | regard for the topologies of the physical networks on which they run. | |
363 | </p> | |
364 | ||
365 | <p> | |
366 | Logical network concepts in OVN include: | |
367 | </p> | |
368 | ||
369 | <ul> | |
370 | <li> | |
371 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
372 | </li> | |
373 | ||
374 | <li> | |
375 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
376 | switches and routers can be connected into sophisticated topologies. | |
377 | </li> | |
378 | ||
379 | <li> | |
380 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
381 | switch. Logical switches and routers are both implemented as logical | |
382 | datapaths. | |
383 | </li> | |
384 | </ul> | |
385 | ||
ca1564ec | 386 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
387 | |
388 | <p> | |
389 | Tables and their schemas presented in isolation are difficult to | |
390 | understand. Here's an example. | |
391 | </p> | |
392 | ||
9fb4636f GS |
393 | <p> |
394 | A VIF on a hypervisor is a virtual network interface attached either | |
395 | to a VM or a container running directly on that hypervisor (This is | |
396 | different from the interface of a container running inside a VM). | |
397 | </p> | |
398 | ||
fe36184b BP |
399 | <p> |
400 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 401 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
402 | <code>ovn-nb</code>(5), respectively, for the full story on these |
403 | databases. | |
404 | </p> | |
405 | ||
406 | <ol> | |
407 | <li> | |
408 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
409 | using the CMS user interface or API and adds it to a switch (one | |
410 | implemented by OVN as a logical switch). The CMS updates its own | |
411 | configuration. This includes associating unique, persistent identifier | |
412 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
413 | </li> | |
414 | ||
415 | <li> | |
416 | The CMS plugin updates the OVN Northbound database to include the new | |
80f408f4 JP |
417 | VIF, by adding a row to the <code>Logical_Switch_Port</code> |
418 | table. In the new row, <code>name</code> is <var>vif-id</var>, | |
419 | <code>mac</code> is <var>mac</var>, <code>switch</code> points to | |
420 | the OVN logical switch's Logical_Switch record, and other columns | |
421 | are initialized appropriately. | |
fe36184b BP |
422 | </li> |
423 | ||
424 | <li> | |
5868eb24 BP |
425 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
426 | turn, it makes the corresponding updates to the OVN Southbound database, | |
427 | by adding rows to the OVN Southbound database <code>Logical_Flow</code> | |
428 | table to reflect the new port, e.g. add a flow to recognize that packets | |
429 | destined to the new port's MAC address should be delivered to it, and | |
430 | update the flow that delivers broadcast and multicast packets to include | |
431 | the new port. It also creates a record in the <code>Binding</code> table | |
432 | and populates all its columns except the column that identifies the | |
9fb4636f | 433 | <code>chassis</code>. |
fe36184b BP |
434 | </li> |
435 | ||
436 | <li> | |
437 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 438 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
439 | in the previous step. As long as the VM that owns the VIF is powered |
440 | off, <code>ovn-controller</code> cannot do much; it cannot, for example, | |
fe36184b BP |
441 | arrange to send packets to or receive packets from the VIF, because the |
442 | VIF does not actually exist anywhere. | |
443 | </li> | |
444 | ||
445 | <li> | |
446 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
447 | where the VM is powered on, the integration between the hypervisor and | |
448 | Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF | |
449 | to the OVN integration bridge and stores <var>vif-id</var> in | |
450 | <code>external-ids</code>:<code>iface-id</code> to indicate that the | |
451 | interface is an instantiation of the new VIF. (None of this code is new | |
452 | in OVN; this is pre-existing integration work that has already been done | |
453 | on hypervisors that support OVS.) | |
454 | </li> | |
455 | ||
456 | <li> | |
457 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
458 | notices <code>external-ids</code>:<code>iface-id</code> in the new | |
968353c2 | 459 | Interface. In response, in the OVN Southbound DB, it updates the |
e387e3e8 | 460 | <code>Binding</code> table's <code>chassis</code> column for the |
968353c2 HK |
461 | row that links the logical port from <code>external-ids</code>:<code> |
462 | iface-id</code> to the hypervisor. Afterward, <code>ovn-controller</code> | |
463 | updates the local hypervisor's OpenFlow tables so that packets to and from | |
464 | the VIF are properly handled. | |
fe36184b BP |
465 | </li> |
466 | ||
467 | <li> | |
468 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
469 | networking is ready. To support this, <code>ovn-northd</code> notices |
470 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 471 | <code>Binding</code> table and pushes this upward by updating the |
80f408f4 JP |
472 | <ref column="up" table="Logical_Switch_Port" db="OVN_NB"/> column |
473 | in the OVN Northbound database's <ref table="Logical_Switch_Port" | |
474 | db="OVN_NB"/> table to indicate that the VIF is now up. The CMS, | |
475 | if it uses this feature, can then react by allowing the VM's | |
476 | execution to proceed. | |
fe36184b BP |
477 | </li> |
478 | ||
479 | <li> | |
480 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 481 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 482 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
483 | the physical location of the logical port, so each instance updates the |
484 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
5868eb24 BP |
485 | DB <code>Logical_Flow</code> table) so that packets to and from the VIF |
486 | can be properly handled via tunnels. | |
fe36184b BP |
487 | </li> |
488 | ||
489 | <li> | |
490 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 491 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
492 | integration bridge. |
493 | </li> | |
494 | ||
495 | <li> | |
6eceebf5 | 496 | On the hypervisor where the VM was powered off, |
fe36184b | 497 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 498 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 499 | <code>Binding</code> table for the logical port. |
fe36184b BP |
500 | </li> |
501 | ||
502 | <li> | |
9fb4636f | 503 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 504 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
505 | for the logical port. This means that <code>ovn-controller</code> no |
506 | longer knows the physical location of the logical port, so each instance | |
507 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
508 | </li> |
509 | ||
510 | <li> | |
511 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
512 | anyone, an administrator deletes the VIF using the CMS user interface or | |
513 | API. The CMS updates its own configuration. | |
514 | </li> | |
515 | ||
516 | <li> | |
517 | The CMS plugin removes the VIF from the OVN Northbound database, | |
80f408f4 | 518 | by deleting its row in the <code>Logical_Switch_Port</code> table. |
fe36184b BP |
519 | </li> |
520 | ||
521 | <li> | |
91ae2065 | 522 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
523 | updates the OVN Southbound database accordingly, by removing or updating |
524 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
525 | and <code>Binding</code> table that were related to the now-destroyed | |
526 | VIF. | |
fe36184b BP |
527 | </li> |
528 | ||
529 | <li> | |
530 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 531 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
532 | in the previous step. <code>ovn-controller</code> updates OpenFlow |
533 | tables to reflect the update, although there may not be much to do, since | |
534 | the VIF had already become unreachable when it was removed from the | |
e387e3e8 | 535 | <code>Binding</code> table in a previous step. |
fe36184b BP |
536 | </li> |
537 | </ol> | |
538 | ||
a30b56d4 | 539 | <h2>Life Cycle of a Container Interface Inside a VM</h2> |
9fb4636f GS |
540 | |
541 | <p> | |
542 | OVN provides virtual network abstractions by converting information | |
543 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
544 | virtual networking for multi-tenants can only be provided if OVN controller | |
545 | is the only entity that can modify flows in Open vSwitch. When the | |
546 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
547 | fair assumption to make that tenant workloads running inside VMs cannot | |
548 | make any changes to Open vSwitch flows. | |
549 | </p> | |
550 | ||
551 | <p> | |
552 | If the infrastructure provider trusts the applications inside the | |
553 | containers not to break out and modify the Open vSwitch flows, then | |
554 | containers can be run in hypervisors. This is also the case when | |
555 | containers are run inside the VMs and Open vSwitch integration bridge | |
556 | with flows added by OVN controller resides in the same VM. For both | |
557 | the above cases, the workflow is the same as explained with an example | |
558 | in the previous section ("Life Cycle of a VIF"). | |
559 | </p> | |
560 | ||
561 | <p> | |
562 | This section talks about the life cycle of a container interface (CIF) | |
563 | when containers are created in the VMs and the Open vSwitch integration | |
564 | bridge resides inside the hypervisor. In this case, even if a container | |
565 | application breaks out, other tenants are not affected because the | |
566 | containers running inside the VMs cannot modify the flows in the | |
567 | Open vSwitch integration bridge. | |
568 | </p> | |
569 | ||
570 | <p> | |
571 | When multiple containers are created inside a VM, there are multiple | |
572 | CIFs associated with them. The network traffic associated with these | |
573 | CIFs need to reach the Open vSwitch integration bridge running in the | |
574 | hypervisor for OVN to support virtual network abstractions. OVN should | |
575 | also be able to distinguish network traffic coming from different CIFs. | |
576 | There are two ways to distinguish network traffic of CIFs. | |
577 | </p> | |
578 | ||
579 | <p> | |
580 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
581 | there could be a lot of network devices in the hypervisor. This would slow | |
582 | down OVS because of all the additional CPU cycles needed for the management | |
583 | of all the VIFs. It would also mean that the entity creating the | |
584 | containers in a VM should also be able to create the corresponding VIFs in | |
585 | the hypervisor. | |
586 | </p> | |
587 | ||
588 | <p> | |
589 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
590 | OVN could then distinguish network traffic coming from different CIFs via | |
591 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
592 | the tagging mechanism. | |
593 | </p> | |
594 | ||
595 | <ol> | |
596 | <li> | |
597 | A CIF's life cycle begins when a container is spawned inside a VM by | |
598 | the either the same CMS that created the VM or a tenant that owns that VM | |
599 | or even a container Orchestration System that is different than the CMS | |
600 | that initially created the VM. Whoever the entity is, it will need to | |
601 | know the <var>vif-id</var> that is associated with the network interface | |
602 | of the VM through which the container interface's network traffic is | |
603 | expected to go through. The entity that creates the container interface | |
604 | will also need to choose an unused VLAN inside that VM. | |
605 | </li> | |
606 | ||
607 | <li> | |
608 | The container spawning entity (either directly or through the CMS that | |
609 | manages the underlying infrastructure) updates the OVN Northbound | |
610 | database to include the new CIF, by adding a row to the | |
80f408f4 JP |
611 | <code>Logical_Switch_Port</code> table. In the new row, |
612 | <code>name</code> is any unique identifier, | |
613 | <code>parent_name</code> is the <var>vif-id</var> of the VM | |
614 | through which the CIF's network traffic is expected to go through | |
615 | and the <code>tag</code> is the VLAN tag that identifies the | |
9fb4636f GS |
616 | network traffic of that CIF. |
617 | </li> | |
618 | ||
619 | <li> | |
5868eb24 BP |
620 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
621 | turn, it makes the corresponding updates to the OVN Southbound database, | |
622 | by adding rows to the OVN Southbound database's <code>Logical_Flow</code> | |
623 | table to reflect the new port and also by creating a new row in the | |
624 | <code>Binding</code> table and populating all its columns except the | |
625 | column that identifies the <code>chassis</code>. | |
9fb4636f GS |
626 | </li> |
627 | ||
628 | <li> | |
629 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 630 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 631 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 632 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
633 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
634 | has that same value in <var>vif-id</var> in | |
635 | <code>external-ids</code>:<code>iface-id</code> | |
9fb4636f GS |
636 | updates the local hypervisor's OpenFlow tables so that packets to and |
637 | from the VIF with the particular VLAN <code>tag</code> are properly | |
638 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 639 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
640 | </li> |
641 | ||
642 | <li> | |
643 | One can only start the application inside the container after the | |
91ae2065 | 644 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 645 | notices the updated <code>chassis</code> column in <code>Binding</code> |
80f408f4 | 646 | table and updates the <ref column="up" table="Logical_Switch_Port" |
9fb4636f | 647 | db="OVN_NB"/> column in the OVN Northbound database's |
80f408f4 | 648 | <ref table="Logical_Switch_Port" db="OVN_NB"/> table to indicate that the |
9fb4636f GS |
649 | CIF is now up. The entity responsible to start the container application |
650 | queries this value and starts the application. | |
651 | </li> | |
652 | ||
653 | <li> | |
654 | Eventually the entity that created and started the container, stops it. | |
655 | The entity, through the CMS (or directly) deletes its row in the | |
80f408f4 | 656 | <code>Logical_Switch_Port</code> table. |
9fb4636f GS |
657 | </li> |
658 | ||
659 | <li> | |
91ae2065 | 660 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
661 | updates the OVN Southbound database accordingly, by removing or updating |
662 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
663 | that were related to the now-destroyed CIF. It also deletes the row in | |
664 | the <code>Binding</code> table for that CIF. | |
9fb4636f GS |
665 | </li> |
666 | ||
667 | <li> | |
668 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 BP |
669 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
670 | in the previous step. <code>ovn-controller</code> updates OpenFlow | |
671 | tables to reflect the update. | |
9fb4636f GS |
672 | </li> |
673 | </ol> | |
b705f9ea | 674 | |
69a832cf | 675 | <h2>Architectural Physical Life Cycle of a Packet</h2> |
b705f9ea | 676 | |
b705f9ea | 677 | <p> |
5868eb24 BP |
678 | This section describes how a packet travels from one virtual machine or |
679 | container to another through OVN. This description focuses on the physical | |
680 | treatment of a packet; for a description of the logical life cycle of a | |
681 | packet, please refer to the <code>Logical_Flow</code> table in | |
682 | <code>ovn-sb</code>(5). | |
b705f9ea JP |
683 | </p> |
684 | ||
5868eb24 BP |
685 | <p> |
686 | This section mentions several data and metadata fields, for clarity | |
687 | summarized here: | |
688 | </p> | |
689 | ||
690 | <dl> | |
691 | <dt>tunnel key</dt> | |
692 | <dd> | |
693 | When OVN encapsulates a packet in Geneve or another tunnel, it attaches | |
694 | extra data to it to allow the receiving OVN instance to process it | |
695 | correctly. This takes different forms depending on the particular | |
696 | encapsulation, but in each case we refer to it here as the ``tunnel | |
697 | key.'' See <code>Tunnel Encapsulations</code>, below, for details. | |
698 | </dd> | |
699 | ||
700 | <dt>logical datapath field</dt> | |
701 | <dd> | |
702 | A field that denotes the logical datapath through which a packet is being | |
4103f6d2 BP |
703 | processed. |
704 | <!-- Keep the following in sync with MFF_LOG_DATAPATH in | |
667e2b0b | 705 | ovn/lib/logical-fields.h. --> |
4103f6d2 BP |
706 | OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls |
707 | ``metadata'' to store the logical datapath. (This field is passed across | |
708 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
709 | </dd> |
710 | ||
711 | <dt>logical input port field</dt> | |
712 | <dd> | |
37910994 JP |
713 | <p> |
714 | A field that denotes the logical port from which the packet | |
715 | entered the logical datapath. | |
716 | <!-- Keep the following in sync with MFF_LOG_INPORT in | |
667e2b0b | 717 | ovn/lib/logical-fields.h. --> |
cc5e28d8 | 718 | OVN stores this in Nicira extension register number 14. |
37910994 JP |
719 | </p> |
720 | ||
721 | <p> | |
722 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
723 | Although VXLAN tunnels do not explicitly carry a logical input port, | |
724 | OVN only uses VXLAN to communicate with gateways that from OVN's | |
725 | perspective consist of only a single logical port, so that OVN can set | |
726 | the logical input port field to this one on ingress to the OVN logical | |
727 | pipeline. | |
728 | </p> | |
5868eb24 BP |
729 | </dd> |
730 | ||
731 | <dt>logical output port field</dt> | |
732 | <dd> | |
37910994 JP |
733 | <p> |
734 | A field that denotes the logical port from which the packet will | |
735 | leave the logical datapath. This is initialized to 0 at the | |
736 | beginning of the logical ingress pipeline. | |
737 | <!-- Keep the following in sync with MFF_LOG_OUTPORT in | |
667e2b0b | 738 | ovn/lib/logical-fields.h. --> |
cc5e28d8 | 739 | OVN stores this in Nicira extension register number 15. |
37910994 JP |
740 | </p> |
741 | ||
742 | <p> | |
743 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
744 | VXLAN tunnels do not transmit the logical output port field. | |
745 | </p> | |
5868eb24 BP |
746 | </dd> |
747 | ||
3bd4ae23 | 748 | <dt>conntrack zone field for logical ports</dt> |
78aab811 | 749 | <dd> |
3bd4ae23 GS |
750 | A field that denotes the connection tracking zone for logical ports. |
751 | The value only has local significance and is not meaningful between | |
752 | chassis. This is initialized to 0 at the beginning of the logical | |
cc5e28d8 JP |
753 | <!-- Keep the following in sync with MFF_LOG_CT_ZONE in |
754 | ovn/lib/logical-fields.h. --> | |
755 | ingress pipeline. OVN stores this in Nicira extension register | |
756 | number 13. | |
3bd4ae23 GS |
757 | </dd> |
758 | ||
759 | <dt>conntrack zone fields for Gateway router</dt> | |
760 | <dd> | |
761 | Fields that denote the connection tracking zones for Gateway routers. | |
762 | These values only have local significance (only on chassis that have | |
763 | Gateway routers instantiated) and is not meaningful between | |
764 | chassis. OVN stores the zone information for DNATting in Nicira | |
cc5e28d8 JP |
765 | <!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and |
766 | MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. --> | |
767 | extension register number 11 and zone information for SNATing in Nicira | |
768 | extension register number 12. | |
78aab811 JP |
769 | </dd> |
770 | ||
5868eb24 BP |
771 | <dt>VLAN ID</dt> |
772 | <dd> | |
773 | The VLAN ID is used as an interface between OVN and containers nested | |
774 | inside a VM (see <code>Life Cycle of a container interface inside a | |
775 | VM</code>, above, for more information). | |
776 | </dd> | |
777 | </dl> | |
778 | ||
779 | <p> | |
780 | Initially, a VM or container on the ingress hypervisor sends a packet on a | |
781 | port attached to the OVN integration bridge. Then: | |
782 | </p> | |
783 | ||
784 | <ol> | |
b705f9ea JP |
785 | <li> |
786 | <p> | |
5868eb24 BP |
787 | OpenFlow table 0 performs physical-to-logical translation. It matches |
788 | the packet's ingress port. Its actions annotate the packet with | |
789 | logical metadata, by setting the logical datapath field to identify the | |
790 | logical datapath that the packet is traversing and the logical input | |
791 | port field to identify the ingress port. Then it resubmits to table 16 | |
792 | to enter the logical ingress pipeline. | |
793 | </p> | |
794 | ||
795 | <p> | |
796 | Packets that originate from a container nested within a VM are treated | |
797 | in a slightly different way. The originating container can be | |
798 | distinguished based on the VIF-specific VLAN ID, so the | |
799 | physical-to-logical translation flows additionally match on VLAN ID and | |
800 | the actions strip the VLAN header. Following this step, OVN treats | |
801 | packets from containers just like any other packets. | |
802 | </p> | |
803 | ||
804 | <p> | |
805 | Table 0 also processes packets that arrive from other chassis. It | |
806 | distinguishes them from other packets by ingress port, which is a | |
807 | tunnel. As with packets just entering the OVN pipeline, the actions | |
808 | annotate these packets with logical datapath and logical ingress port | |
809 | metadata. In addition, the actions set the logical output port field, | |
810 | which is available because in OVN tunneling occurs after the logical | |
811 | output port is known. These three pieces of information are obtained | |
812 | from the tunnel encapsulation metadata (see <code>Tunnel | |
813 | Encapsulations</code> for encoding details). Then the actions resubmit | |
814 | to table 33 to enter the logical egress pipeline. | |
b705f9ea JP |
815 | </p> |
816 | </li> | |
817 | ||
818 | <li> | |
819 | <p> | |
5868eb24 BP |
820 | OpenFlow tables 16 through 31 execute the logical ingress pipeline from |
821 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
822 | These tables are expressed entirely in terms of logical concepts like | |
823 | logical ports and logical datapaths. A big part of | |
824 | <code>ovn-controller</code>'s job is to translate them into equivalent | |
825 | OpenFlow (in particular it translates the table numbers: | |
826 | <code>Logical_Flow</code> tables 0 through 15 become OpenFlow tables 16 | |
0bac7164 | 827 | through 31). |
b705f9ea | 828 | </p> |
5868eb24 | 829 | |
0bac7164 BP |
830 | <p> |
831 | Most OVN actions have fairly obvious implementations in OpenFlow (with | |
832 | OVS extensions), e.g. <code>next;</code> is implemented as | |
833 | <code>resubmit</code>, <code><var>field</var> = | |
834 | <var>constant</var>;</code> as <code>set_field</code>. A few are worth | |
835 | describing in more detail: | |
836 | </p> | |
837 | ||
838 | <dl> | |
839 | <dt><code>output:</code></dt> | |
840 | <dd> | |
841 | Implemented by resubmitting the packet to table 32. If the pipeline | |
842 | executes more than one <code>output</code> action, then each one is | |
843 | separately resubmitted to table 32. This can be used to send | |
844 | multiple copies of the packet to multiple ports. (If the packet was | |
845 | not modified between the <code>output</code> actions, and some of the | |
846 | copies are destined to the same hypervisor, then using a logical | |
847 | multicast output port would save bandwidth between hypervisors.) | |
848 | </dd> | |
849 | ||
850 | <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt> | |
851 | <dd> | |
852 | <p> | |
853 | Implemented by storing arguments into OpenFlow fields, then | |
854 | resubmitting to table 65, which <code>ovn-controller</code> | |
855 | populates with flows generated from the <code>MAC_Binding</code> | |
856 | table in the OVN Southbound database. If there is a match in table | |
857 | 65, then its actions store the bound MAC in the Ethernet | |
858 | destination address field. | |
859 | </p> | |
860 | ||
861 | <p> | |
862 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
863 | the arguments, so that the OVN actions do not have to be aware of | |
864 | this temporary use.) | |
865 | </p> | |
866 | </dd> | |
867 | ||
868 | <dt><code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt> | |
869 | <dd> | |
870 | <p> | |
871 | Implemented by storing the arguments into OpenFlow fields, then | |
872 | outputting a packet to <code>ovn-controller</code>, which updates | |
873 | the <code>MAC_Binding</code> table. | |
874 | </p> | |
875 | ||
876 | <p> | |
877 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
878 | the arguments, so that the OVN actions do not have to be aware of | |
879 | this temporary use.) | |
880 | </p> | |
881 | </dd> | |
882 | </dl> | |
b705f9ea JP |
883 | </li> |
884 | ||
885 | <li> | |
886 | <p> | |
5868eb24 BP |
887 | OpenFlow tables 32 through 47 implement the <code>output</code> action |
888 | in the logical ingress pipeline. Specifically, table 32 handles | |
889 | packets to remote hypervisors, table 33 handles packets to the local | |
890 | hypervisor, and table 34 discards packets whose logical ingress and | |
891 | egress port are the same. | |
892 | </p> | |
893 | ||
0b7da177 BP |
894 | <p> |
895 | Logical patch ports are a special case. Logical patch ports do not | |
896 | have a physical location and effectively reside on every hypervisor. | |
897 | Thus, flow table 33, for output to ports on the local hypervisor, | |
898 | naturally implements output to unicast logical patch ports too. | |
899 | However, applying the same logic to a logical patch port that is part | |
900 | of a logical multicast group yields packet duplication, because each | |
901 | hypervisor that contains a logical port in the multicast group will | |
902 | also output the packet to the logical patch port. Thus, multicast | |
903 | groups implement output to logical patch ports in table 32. | |
904 | </p> | |
905 | ||
5868eb24 BP |
906 | <p> |
907 | Each flow in table 32 matches on a logical output port for unicast or | |
908 | multicast logical ports that include a logical port on a remote | |
909 | hypervisor. Each flow's actions implement sending a packet to the port | |
910 | it matches. For unicast logical output ports on remote hypervisors, | |
911 | the actions set the tunnel key to the correct value, then send the | |
912 | packet on the tunnel port to the correct hypervisor. (When the remote | |
913 | hypervisor receives the packet, table 0 there will recognize it as a | |
914 | tunneled packet and pass it along to table 33.) For multicast logical | |
915 | output ports, the actions send one copy of the packet to each remote | |
916 | hypervisor, in the same way as for unicast destinations. If a | |
917 | multicast group includes a logical port or ports on the local | |
918 | hypervisor, then its actions also resubmit to table 33. Table 32 also | |
919 | includes a fallback flow that resubmits to table 33 if there is no | |
920 | other match. | |
921 | </p> | |
922 | ||
923 | <p> | |
924 | Flows in table 33 resemble those in table 32 but for logical ports that | |
0b7da177 | 925 | reside locally rather than remotely. For unicast logical output ports |
5868eb24 BP |
926 | on the local hypervisor, the actions just resubmit to table 34. For |
927 | multicast output ports that include one or more logical ports on the | |
928 | local hypervisor, for each such logical port <var>P</var>, the actions | |
929 | change the logical output port to <var>P</var>, then resubmit to table | |
930 | 34. | |
931 | </p> | |
932 | ||
6e6c3f91 HZ |
933 | <p> |
934 | A special case is that when a localnet port exists on the datapath, | |
935 | remote port is connected by switching to the localnet port. In this | |
936 | case, instead of adding a flow in table 32 to reach the remote port, a | |
937 | flow is added in table 33 to switch the logical outport to the localnet | |
938 | port, and resubmit to table 33 as if it were unicasted to a logical | |
939 | port on the local hypervisor. | |
940 | </p> | |
941 | ||
5868eb24 BP |
942 | <p> |
943 | Table 34 matches and drops packets for which the logical input and | |
944 | output ports are the same. It resubmits other packets to table 48. | |
b705f9ea JP |
945 | </p> |
946 | </li> | |
5868eb24 BP |
947 | |
948 | <li> | |
949 | <p> | |
950 | OpenFlow tables 48 through 63 execute the logical egress pipeline from | |
951 | the <code>Logical_Flow</code> table in the OVN Southbound database. | |
952 | The egress pipeline can perform a final stage of validation before | |
953 | packet delivery. Eventually, it may execute an <code>output</code> | |
954 | action, which <code>ovn-controller</code> implements by resubmitting to | |
955 | table 64. A packet for which the pipeline never executes | |
956 | <code>output</code> is effectively dropped (although it may have been | |
957 | transmitted through a tunnel across a physical network). | |
958 | </p> | |
959 | ||
960 | <p> | |
961 | The egress pipeline cannot change the logical output port or cause | |
962 | further tunneling. | |
963 | </p> | |
964 | </li> | |
965 | ||
966 | <li> | |
967 | <p> | |
968 | OpenFlow table 64 performs logical-to-physical translation, the | |
969 | opposite of table 0. It matches the packet's logical egress port. Its | |
970 | actions output the packet to the port attached to the OVN integration | |
971 | bridge that represents that logical port. If the logical egress port | |
972 | is a container nested with a VM, then before sending the packet the | |
973 | actions push on a VLAN header with an appropriate VLAN ID. | |
974 | </p> | |
d387d24d BP |
975 | |
976 | <p> | |
977 | If the logical egress port is a logical patch port, then table 64 | |
978 | outputs to an OVS patch port that represents the logical patch port. | |
979 | The packet re-enters the OpenFlow flow table from the OVS patch port's | |
980 | peer in table 0, which identifies the logical datapath and logical | |
981 | input port based on the OVS patch port's OpenFlow port number. | |
982 | </p> | |
5868eb24 BP |
983 | </li> |
984 | </ol> | |
985 | ||
88058f19 AW |
986 | <h2>Life Cycle of a VTEP gateway</h2> |
987 | ||
988 | <p> | |
989 | A gateway is a chassis that forwards traffic between the OVN-managed | |
990 | part of a logical network and a physical VLAN, extending a | |
991 | tunnel-based logical network into a physical network. | |
992 | </p> | |
993 | ||
994 | <p> | |
995 | The steps below refer often to details of the OVN and VTEP database | |
996 | schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5) | |
997 | and <code>vtep</code>(5), respectively, for the full story on these | |
998 | databases. | |
999 | </p> | |
1000 | ||
1001 | <ol> | |
1002 | <li> | |
1003 | A VTEP gateway's life cycle begins with the administrator registering | |
1004 | the VTEP gateway as a <code>Physical_Switch</code> table entry in the | |
1005 | <code>VTEP</code> database. The <code>ovn-controller-vtep</code> | |
1006 | connected to this VTEP database, will recognize the new VTEP gateway | |
1007 | and create a new <code>Chassis</code> table entry for it in the | |
1008 | <code>OVN_Southbound</code> database. | |
1009 | </li> | |
1010 | ||
1011 | <li> | |
1012 | The administrator can then create a new <code>Logical_Switch</code> | |
1013 | table entry, and bind a particular vlan on a VTEP gateway's port to | |
1014 | any VTEP logical switch. Once a VTEP logical switch is bound to | |
1015 | a VTEP gateway, the <code>ovn-controller-vtep</code> will detect | |
1016 | it and add its name to the <var>vtep_logical_switches</var> | |
1017 | column of the <code>Chassis</code> table in the <code> | |
1018 | OVN_Southbound</code> database. Note, the <var>tunnel_key</var> | |
1019 | column of VTEP logical switch is not filled at creation. The | |
1020 | <code>ovn-controller-vtep</code> will set the column when the | |
1021 | correponding vtep logical switch is bound to an OVN logical network. | |
1022 | </li> | |
1023 | ||
1024 | <li> | |
1025 | Now, the administrator can use the CMS to add a VTEP logical switch | |
1026 | to the OVN logical network. To do that, the CMS must first create a | |
80f408f4 | 1027 | new <code>Logical_Switch_Port</code> table entry in the <code> |
88058f19 AW |
1028 | OVN_Northbound</code> database. Then, the <var>type</var> column |
1029 | of this entry must be set to "vtep". Next, the <var> | |
1030 | vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys | |
1031 | in the <var>options</var> column must also be specified, since | |
1032 | multiple VTEP gateways can attach to the same VTEP logical switch. | |
1033 | </li> | |
1034 | ||
1035 | <li> | |
1036 | The newly created logical port in the <code>OVN_Northbound</code> | |
1037 | database and its configuration will be passed down to the <code> | |
1038 | OVN_Southbound</code> database as a new <code>Port_Binding</code> | |
1039 | table entry. The <code>ovn-controller-vtep</code> will recognize the | |
1040 | change and bind the logical port to the corresponding VTEP gateway | |
1041 | chassis. Configuration of binding the same VTEP logical switch to | |
1042 | a different OVN logical networks is not allowed and a warning will be | |
1043 | generated in the log. | |
1044 | </li> | |
1045 | ||
1046 | <li> | |
1047 | Beside binding to the VTEP gateway chassis, the <code> | |
1048 | ovn-controller-vtep</code> will update the <var>tunnel_key</var> | |
1049 | column of the VTEP logical switch to the corresponding <code> | |
1050 | Datapath_Binding</code> table entry's <var>tunnel_key</var> for the | |
1051 | bound OVN logical network. | |
1052 | </li> | |
1053 | ||
1054 | <li> | |
1055 | Next, the <code>ovn-controller-vtep</code> will keep reacting to the | |
1056 | configuration change in the <code>Port_Binding</code> in the | |
1057 | <code>OVN_Northbound</code> database, and updating the | |
1058 | <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database. | |
1059 | This allows the VTEP gateway to understand where to forward the unicast | |
1060 | traffic coming from the extended external network. | |
1061 | </li> | |
1062 | ||
1063 | <li> | |
1064 | Eventually, the VTEP gateway's life cycle ends when the administrator | |
1065 | unregisters the VTEP gateway from the <code>VTEP</code> database. | |
1066 | The <code>ovn-controller-vtep</code> will recognize the event and | |
1067 | remove all related configurations (<code>Chassis</code> table entry | |
1068 | and port bindings) in the <code>OVN_Southbound</code> database. | |
1069 | </li> | |
1070 | ||
1071 | <li> | |
1072 | When the <code>ovn-controller-vtep</code> is terminated, all related | |
1073 | configurations in the <code>OVN_Southbound</code> database and | |
1074 | the <code>VTEP</code> database will be cleaned, including | |
1075 | <code>Chassis</code> table entries for all registered VTEP gateways | |
1076 | and their port bindings, and all <code>Ucast_Macs_Remote</code> table | |
1077 | entries and the <code>Logical_Switch</code> tunnel keys. | |
1078 | </li> | |
1079 | </ol> | |
1080 | ||
5868eb24 BP |
1081 | <h1>Design Decisions</h1> |
1082 | ||
1083 | <h2>Tunnel Encapsulations</h2> | |
1084 | ||
1085 | <p> | |
1086 | OVN annotates logical network packets that it sends from one hypervisor to | |
1087 | another with the following three pieces of metadata, which are encoded in | |
1088 | an encapsulation-specific fashion: | |
1089 | </p> | |
1090 | ||
1091 | <ul> | |
1092 | <li> | |
1093 | 24-bit logical datapath identifier, from the <code>tunnel_key</code> | |
1094 | column in the OVN Southbound <code>Datapath_Binding</code> table. | |
1095 | </li> | |
1096 | ||
1097 | <li> | |
1098 | 15-bit logical ingress port identifier. ID 0 is reserved for internal | |
1099 | use within OVN. IDs 1 through 32767, inclusive, may be assigned to | |
1100 | logical ports (see the <code>tunnel_key</code> column in the OVN | |
1101 | Southbound <code>Port_Binding</code> table). | |
1102 | </li> | |
1103 | ||
1104 | <li> | |
1105 | 16-bit logical egress port identifier. IDs 0 through 32767 have the same | |
1106 | meaning as for logical ingress ports. IDs 32768 through 65535, | |
1107 | inclusive, may be assigned to logical multicast groups (see the | |
1108 | <code>tunnel_key</code> column in the OVN Southbound | |
1109 | <code>Multicast_Group</code> table). | |
1110 | </li> | |
b705f9ea JP |
1111 | </ul> |
1112 | ||
1113 | <p> | |
5868eb24 BP |
1114 | For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT |
1115 | encapsulations, for the following reasons: | |
b705f9ea JP |
1116 | </p> |
1117 | ||
5868eb24 BP |
1118 | <ul> |
1119 | <li> | |
1120 | Only STT and Geneve support the large amounts of metadata (over 32 bits | |
1121 | per packet) that OVN uses (as described above). | |
1122 | </li> | |
1123 | ||
1124 | <li> | |
1125 | STT and Geneve use randomized UDP or TCP source ports that allows | |
1126 | efficient distribution among multiple paths in environments that use ECMP | |
1127 | in their underlay. | |
1128 | </li> | |
1129 | ||
1130 | <li> | |
1131 | NICs are available to offload STT and Geneve encapsulation and | |
1132 | decapsulation. | |
1133 | </li> | |
1134 | </ul> | |
1135 | ||
1136 | <p> | |
1137 | Due to its flexibility, the preferred encapsulation between hypervisors is | |
1138 | Geneve. For Geneve encapsulation, OVN transmits the logical datapath | |
1139 | identifier in the Geneve VNI. | |
1140 | ||
1141 | <!-- Keep the following in sync with ovn/controller/physical.h. --> | |
1142 | OVN transmits the logical ingress and logical egress ports in a TLV with | |
57d44532 | 1143 | class 0x0102, type 0, and a 32-bit value encoded as follows, from MSB to |
5868eb24 BP |
1144 | LSB: |
1145 | </p> | |
1146 | ||
1147 | <diagram> | |
1148 | <header name=""> | |
1149 | <bits name="rsv" above="1" below="0" width=".25"/> | |
1150 | <bits name="ingress port" above="15" width=".75"/> | |
1151 | <bits name="egress port" above="16" width=".75"/> | |
1152 | </header> | |
1153 | </diagram> | |
1154 | ||
1155 | <p> | |
1156 | Environments whose NICs lack Geneve offload may prefer STT encapsulation | |
1157 | for performance reasons. For STT encapsulation, OVN encodes all three | |
1158 | pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB | |
1159 | to LSB: | |
1160 | </p> | |
1161 | ||
1162 | <diagram> | |
1163 | <header name=""> | |
1164 | <bits name="reserved" above="9" below="0" width=".5"/> | |
1165 | <bits name="ingress port" above="15" width=".75"/> | |
1166 | <bits name="egress port" above="16" width=".75"/> | |
1167 | <bits name="datapath" above="24" width="1.25"/> | |
1168 | </header> | |
1169 | </diagram> | |
1170 | ||
b705f9ea | 1171 | <p> |
5868eb24 BP |
1172 | For connecting to gateways, in addition to Geneve and STT, OVN supports |
1173 | VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches. | |
1174 | Currently, gateways have a feature set that matches the capabilities as | |
1175 | defined by the VTEP schema, so fewer bits of metadata are necessary. In | |
1176 | the future, gateways that do not support encapsulations with large amounts | |
1177 | of metadata may continue to have a reduced feature set. | |
b705f9ea | 1178 | </p> |
fe36184b | 1179 | </manpage> |