]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
2567fb84 | 45 | <code>IntegrationGuide.rst</code> in the OVS source tree. Any hypervisor |
fe36184b BP |
46 | platform supported by Open vSwitch is acceptable. |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
56 | switch that supports the <code>vtep</code>(5) schema. (Support for the | |
57 | latter will come later in OVN implementation.) | |
fe36184b BP |
58 | </p> |
59 | ||
60 | <p> | |
fa6aeaeb RB |
61 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
62 | or <dfn>chassis</dfn>. | |
fe36184b BP |
63 | </p> |
64 | </li> | |
65 | </ul> | |
66 | ||
67 | <p> | |
68 | The diagram below shows how the major components of OVN and related | |
69 | software interact. Starting at the top of the diagram, we have: | |
70 | </p> | |
71 | ||
72 | <ul> | |
73 | <li> | |
74 | The Cloud Management System, as defined above. | |
75 | </li> | |
76 | ||
77 | <li> | |
78 | <p> | |
fa6aeaeb RB |
79 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
80 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
81 | The plugin's main purpose is to translate the CMS's notion of logical | |
82 | network configuration, stored in the CMS's configuration database in a | |
83 | CMS-specific format, into an intermediate representation understood by | |
84 | OVN. | |
fe36184b BP |
85 | </p> |
86 | ||
87 | <p> | |
fa6aeaeb RB |
88 | This component is necessarily CMS-specific, so a new plugin needs to be |
89 | developed for each CMS that is integrated with OVN. All of the | |
90 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
91 | </p> |
92 | </li> | |
93 | ||
94 | <li> | |
95 | <p> | |
fa6aeaeb RB |
96 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
97 | representation of logical network configuration passed down by the | |
98 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
99 | matched'' with the concepts used in a CMS, so that it directly supports | |
100 | notions of logical switches, routers, ACLs, and so on. See | |
5868eb24 | 101 | <code>ovn-nb</code>(5) for details. |
fe36184b BP |
102 | </p> |
103 | ||
104 | <p> | |
fa6aeaeb RB |
105 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
106 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
107 | </p> |
108 | </li> | |
109 | ||
110 | <li> | |
91ae2065 RB |
111 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
112 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
113 | logical network configuration in terms of conventional network |
114 | concepts, taken from the OVN Northbound Database, into logical | |
115 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
116 | </li> |
117 | ||
118 | <li> | |
119 | <p> | |
ec78987f | 120 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 121 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 122 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
123 | </p> |
124 | ||
125 | <p> | |
fa6aeaeb RB |
126 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
127 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
128 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
129 | logical network in terms of ``logical datapath flows,'' and | |
130 | <dfn>Binding</dfn> tables that link logical network components' | |
131 | locations to the physical network. The hypervisors populate the PN and | |
dcda6e0d BP |
132 | Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the |
133 | LN tables. | |
fe36184b BP |
134 | </p> |
135 | ||
136 | <p> | |
ec78987f JP |
137 | OVN Southbound Database performance must scale with the number of |
138 | transport nodes. This will likely require some work on | |
139 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
140 | Clustering for availability may be needed. | |
fe36184b BP |
141 | </p> |
142 | </li> | |
143 | </ul> | |
144 | ||
145 | <p> | |
146 | The remaining components are replicated onto each hypervisor: | |
147 | </p> | |
148 | ||
149 | <ul> | |
150 | <li> | |
151 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
152 | software gateway. Northbound, it connects to the OVN Southbound |
153 | Database to learn about OVN configuration and status and to | |
154 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 155 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
156 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
157 | OpenFlow controller, for control over network traffic, and to the | |
158 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
159 | control Open vSwitch configuration. | |
fe36184b BP |
160 | </li> |
161 | ||
162 | <li> | |
163 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
164 | conventional components of Open vSwitch. | |
165 | </li> | |
166 | </ul> | |
167 | ||
168 | <pre fixed="yes"> | |
169 | CMS | |
170 | | | |
171 | | | |
172 | +-----------|-----------+ | |
173 | | | | | |
174 | | OVN/CMS Plugin | | |
175 | | | | | |
176 | | | | | |
177 | | OVN Northbound DB | | |
178 | | | | | |
179 | | | | | |
91ae2065 | 180 | | ovn-northd | |
fe36184b BP |
181 | | | | |
182 | +-----------|-----------+ | |
183 | | | |
184 | | | |
ec78987f JP |
185 | +-------------------+ |
186 | | OVN Southbound DB | | |
187 | +-------------------+ | |
fe36184b BP |
188 | | |
189 | | | |
190 | +------------------+------------------+ | |
191 | | | | | |
ec78987f | 192 | HV 1 | | HV n | |
fe36184b BP |
193 | +---------------|---------------+ . +---------------|---------------+ |
194 | | | | . | | | | |
195 | | ovn-controller | . | ovn-controller | | |
196 | | | | | . | | | | | |
197 | | | | | | | | | | |
198 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
199 | | | | | | |
200 | +-------------------------------+ +-------------------------------+ | |
201 | </pre> | |
202 | ||
fa183acc BP |
203 | <h2>Information Flow in OVN</h2> |
204 | ||
205 | <p> | |
206 | Configuration data in OVN flows from north to south. The CMS, through its | |
207 | OVN/CMS plugin, passes the logical network configuration to | |
208 | <code>ovn-northd</code> via the northbound database. In turn, | |
209 | <code>ovn-northd</code> compiles the configuration into a lower-level form | |
210 | and passes it to all of the chassis via the southbound database. | |
211 | </p> | |
212 | ||
213 | <p> | |
214 | Status information in OVN flows from south to north. OVN currently | |
215 | provides only a few forms of status information. First, | |
216 | <code>ovn-northd</code> populates the <code>up</code> column in the | |
217 | northbound <code>Logical_Switch_Port</code> table: if a logical port's | |
218 | <code>chassis</code> column in the southbound <code>Port_Binding</code> | |
219 | table is nonempty, it sets <code>up</code> to <code>true</code>, otherwise | |
220 | to <code>false</code>. This allows the CMS to detect when a VM's | |
221 | networking has come up. | |
222 | </p> | |
223 | ||
224 | <p> | |
225 | Second, OVN provides feedback to the CMS on the realization of its | |
226 | configuration, that is, whether the configuration provided by the CMS has | |
227 | taken effect. This feature requires the CMS to participate in a sequence | |
228 | number protocol, which works the following way: | |
229 | </p> | |
230 | ||
231 | <ol> | |
232 | <li> | |
233 | When the CMS updates the configuration in the northbound database, as | |
234 | part of the same transaction, it increments the value of the | |
235 | <code>nb_cfg</code> column in the <code>NB_Global</code> table. (This is | |
236 | only necessary if the CMS wants to know when the configuration has been | |
237 | realized.) | |
238 | </li> | |
239 | ||
240 | <li> | |
241 | When <code>ovn-northd</code> updates the southbound database based on a | |
242 | given snapshot of the northbound database, it copies <code>nb_cfg</code> | |
243 | from northbound <code>NB_Global</code> into the southbound database | |
244 | <code>SB_Global</code> table, as part of the same transaction. (Thus, an | |
245 | observer monitoring both databases can determine when the southbound | |
246 | database is caught up with the northbound.) | |
247 | </li> | |
248 | ||
249 | <li> | |
250 | After <code>ovn-northd</code> receives confirmation from the southbound | |
251 | database server that its changes have committed, it updates | |
252 | <code>sb_cfg</code> in the northbound <code>NB_Global</code> table to the | |
253 | <code>nb_cfg</code> version that was pushed down. (Thus, the CMS or | |
254 | another observer can determine when the southbound database is caught up | |
255 | without a connection to the southbound database.) | |
256 | </li> | |
257 | ||
258 | <li> | |
259 | The <code>ovn-controller</code> process on each chassis receives the | |
260 | updated southbound database, with the updated <code>nb_cfg</code>. This | |
261 | process in turn updates the physical flows installed in the chassis's | |
262 | Open vSwitch instances. When it receives confirmation from Open vSwitch | |
263 | that the physical flows have been updated, it updates <code>nb_cfg</code> | |
264 | in its own <code>Chassis</code> record in the southbound database. | |
265 | </li> | |
266 | ||
267 | <li> | |
268 | <code>ovn-northd</code> monitors the <code>nb_cfg</code> column in all of | |
269 | the <code>Chassis</code> records in the southbound database. It keeps | |
270 | track of the minimum value among all the records and copies it into the | |
271 | <code>hv_cfg</code> column in the northbound <code>NB_Global</code> | |
272 | table. (Thus, the CMS or another observer can determine when all of the | |
273 | hypervisors have caught up to the northbound configuration.) | |
274 | </li> | |
275 | </ol> | |
276 | ||
ca1564ec BP |
277 | <h2>Chassis Setup</h2> |
278 | ||
279 | <p> | |
280 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
281 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
e43fc07c RB |
282 | System startup scripts may create this bridge prior to starting |
283 | <code>ovn-controller</code> if desired. If this bridge does not exist when | |
284 | ovn-controller starts, it will be created automatically with the default | |
285 | configuration suggested below. The ports on the integration bridge include: | |
ca1564ec BP |
286 | </p> |
287 | ||
288 | <ul> | |
289 | <li> | |
290 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
291 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
292 | these tunnel ports. | |
293 | </li> | |
294 | ||
295 | <li> | |
296 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
297 | The hypervisor itself, or the integration between Open vSwitch and the | |
2567fb84 | 298 | hypervisor (described in <code>IntegrationGuide.rst</code>) takes care of |
ca1564ec BP |
299 | this. (This is not part of OVN or new to OVN; this is pre-existing |
300 | integration work that has already been done on hypervisors that support | |
301 | OVS.) | |
302 | </li> | |
303 | ||
304 | <li> | |
305 | On a gateway, the physical port used for logical network connectivity. | |
306 | System startup scripts add this port to the bridge prior to starting | |
307 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
308 | instead of a physical port, in more sophisticated setups. | |
309 | </li> | |
310 | </ul> | |
311 | ||
312 | <p> | |
313 | Other ports should not be attached to the integration bridge. In | |
314 | particular, physical ports attached to the underlay network (as opposed to | |
315 | gateway ports, which are physical ports attached to logical networks) must | |
316 | not be attached to the integration bridge. Underlay physical ports should | |
317 | instead be attached to a separate Open vSwitch bridge (they need not be | |
318 | attached to any bridge at all, in fact). | |
319 | </p> | |
320 | ||
321 | <p> | |
a42226f0 BP |
322 | The integration bridge should be configured as described below. |
323 | The effect of each of these settings is documented in | |
324 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
325 | </p> |
326 | ||
e43fc07c RB |
327 | <!-- Keep the following in sync with create_br_int() in |
328 | ovn/controller/ovn-controller.c. --> | |
a42226f0 BP |
329 | <dl> |
330 | <dt><code>fail-mode=secure</code></dt> | |
331 | <dd> | |
332 | Avoids switching packets between isolated logical networks before | |
333 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
334 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
335 | </dd> | |
336 | ||
337 | <dt><code>other-config:disable-in-band=true</code></dt> | |
338 | <dd> | |
339 | Suppresses in-band control flows for the integration bridge. It would be | |
340 | unusual for such flows to show up anyway, because OVN uses a local | |
341 | controller (over a Unix domain socket) instead of a remote controller. | |
342 | It's possible, however, for some other bridge in the same system to have | |
343 | an in-band remote controller, and in that case this suppresses the flows | |
7c9afefd SF |
344 | that in-band control would ordinarily set up. Refer to the documentation |
345 | for more information. | |
a42226f0 BP |
346 | </dd> |
347 | </dl> | |
348 | ||
ca1564ec BP |
349 | <p> |
350 | The customary name for the integration bridge is <code>br-int</code>, but | |
351 | another name may be used. | |
352 | </p> | |
353 | ||
747b2a45 BP |
354 | <h2>Logical Networks</h2> |
355 | ||
356 | <p> | |
357 | A <dfn>logical network</dfn> implements the same concepts as physical | |
358 | networks, but they are insulated from the physical network with tunnels or | |
359 | other encapsulations. This allows logical networks to have separate IP and | |
360 | other address spaces that overlap, without conflicting, with those used for | |
361 | physical networks. Logical network topologies can be arranged without | |
362 | regard for the topologies of the physical networks on which they run. | |
363 | </p> | |
364 | ||
365 | <p> | |
366 | Logical network concepts in OVN include: | |
367 | </p> | |
368 | ||
369 | <ul> | |
370 | <li> | |
371 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
372 | </li> | |
373 | ||
374 | <li> | |
375 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
376 | switches and routers can be connected into sophisticated topologies. | |
377 | </li> | |
378 | ||
379 | <li> | |
380 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
381 | switch. Logical switches and routers are both implemented as logical | |
382 | datapaths. | |
383 | </li> | |
3a77e831 MS |
384 | |
385 | <li> | |
386 | <p> | |
387 | <dfn>Logical ports</dfn> represent the points of connectivity in and | |
388 | out of logical switches and logical routers. Some common types of | |
389 | logical ports are: | |
390 | </p> | |
391 | ||
392 | <ul> | |
393 | <li> | |
394 | Logical ports representing VIFs. | |
395 | </li> | |
396 | ||
397 | <li> | |
398 | <dfn>Localnet ports</dfn> represent the points of connectivity | |
399 | between logical switches and the physical network. They are | |
400 | implemented as OVS patch ports between the integration bridge | |
401 | and the separate Open vSwitch bridge that underlay physical | |
402 | ports attach to. | |
403 | </li> | |
404 | ||
405 | <li> | |
406 | <dfn>Logical patch ports</dfn> represent the points of | |
407 | connectivity between logical switches and logical routers, and | |
408 | in some cases between peer logical routers. There is a pair of | |
409 | logical patch ports at each such point of connectivity, one on | |
410 | each side. | |
411 | </li> | |
2a38ef45 DA |
412 | <li> |
413 | <dfn>Localport ports</dfn> represent the points of local | |
414 | connectivity between logical switches and VIFs. These ports are | |
415 | present in every chassis (not bound to any particular one) and | |
416 | traffic from them will never go through a tunnel. A | |
417 | <code>localport</code> is expected to only generate traffic destined | |
418 | for a local destination, typically in response to a request it | |
419 | received. | |
420 | One use case is how OpenStack Neutron uses a <code>localport</code> | |
421 | port for serving metadata to VM's residing on every hypervisor. A | |
422 | metadata proxy process is attached to this port on every host and all | |
423 | VM's within the same network will reach it at the same IP/MAC address | |
424 | without any traffic being sent over a tunnel. Further details can be | |
425 | seen at https://docs.openstack.org/developer/networking-ovn/design/metadata_api.html. | |
426 | </li> | |
3a77e831 MS |
427 | </ul> |
428 | </li> | |
747b2a45 BP |
429 | </ul> |
430 | ||
ca1564ec | 431 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
432 | |
433 | <p> | |
434 | Tables and their schemas presented in isolation are difficult to | |
435 | understand. Here's an example. | |
436 | </p> | |
437 | ||
9fb4636f GS |
438 | <p> |
439 | A VIF on a hypervisor is a virtual network interface attached either | |
440 | to a VM or a container running directly on that hypervisor (This is | |
441 | different from the interface of a container running inside a VM). | |
442 | </p> | |
443 | ||
fe36184b BP |
444 | <p> |
445 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 446 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
447 | <code>ovn-nb</code>(5), respectively, for the full story on these |
448 | databases. | |
449 | </p> | |
450 | ||
451 | <ol> | |
452 | <li> | |
453 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
454 | using the CMS user interface or API and adds it to a switch (one | |
455 | implemented by OVN as a logical switch). The CMS updates its own | |
456 | configuration. This includes associating unique, persistent identifier | |
457 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
458 | </li> | |
459 | ||
460 | <li> | |
461 | The CMS plugin updates the OVN Northbound database to include the new | |
80f408f4 JP |
462 | VIF, by adding a row to the <code>Logical_Switch_Port</code> |
463 | table. In the new row, <code>name</code> is <var>vif-id</var>, | |
464 | <code>mac</code> is <var>mac</var>, <code>switch</code> points to | |
465 | the OVN logical switch's Logical_Switch record, and other columns | |
466 | are initialized appropriately. | |
fe36184b BP |
467 | </li> |
468 | ||
469 | <li> | |
5868eb24 BP |
470 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
471 | turn, it makes the corresponding updates to the OVN Southbound database, | |
472 | by adding rows to the OVN Southbound database <code>Logical_Flow</code> | |
473 | table to reflect the new port, e.g. add a flow to recognize that packets | |
474 | destined to the new port's MAC address should be delivered to it, and | |
475 | update the flow that delivers broadcast and multicast packets to include | |
476 | the new port. It also creates a record in the <code>Binding</code> table | |
477 | and populates all its columns except the column that identifies the | |
9fb4636f | 478 | <code>chassis</code>. |
fe36184b BP |
479 | </li> |
480 | ||
481 | <li> | |
482 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 483 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
484 | in the previous step. As long as the VM that owns the VIF is powered |
485 | off, <code>ovn-controller</code> cannot do much; it cannot, for example, | |
fe36184b BP |
486 | arrange to send packets to or receive packets from the VIF, because the |
487 | VIF does not actually exist anywhere. | |
488 | </li> | |
489 | ||
490 | <li> | |
491 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
492 | where the VM is powered on, the integration between the hypervisor and | |
2567fb84 | 493 | Open vSwitch (described in <code>IntegrationGuide.rst</code>) adds the VIF |
fe36184b | 494 | to the OVN integration bridge and stores <var>vif-id</var> in |
2f4962f1 | 495 | <code>external_ids</code>:<code>iface-id</code> to indicate that the |
fe36184b BP |
496 | interface is an instantiation of the new VIF. (None of this code is new |
497 | in OVN; this is pre-existing integration work that has already been done | |
498 | on hypervisors that support OVS.) | |
499 | </li> | |
500 | ||
501 | <li> | |
502 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
2f4962f1 | 503 | notices <code>external_ids</code>:<code>iface-id</code> in the new |
968353c2 | 504 | Interface. In response, in the OVN Southbound DB, it updates the |
e387e3e8 | 505 | <code>Binding</code> table's <code>chassis</code> column for the |
2f4962f1 | 506 | row that links the logical port from <code>external_ids</code>:<code> |
968353c2 HK |
507 | iface-id</code> to the hypervisor. Afterward, <code>ovn-controller</code> |
508 | updates the local hypervisor's OpenFlow tables so that packets to and from | |
509 | the VIF are properly handled. | |
fe36184b BP |
510 | </li> |
511 | ||
512 | <li> | |
513 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
514 | networking is ready. To support this, <code>ovn-northd</code> notices |
515 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 516 | <code>Binding</code> table and pushes this upward by updating the |
80f408f4 JP |
517 | <ref column="up" table="Logical_Switch_Port" db="OVN_NB"/> column |
518 | in the OVN Northbound database's <ref table="Logical_Switch_Port" | |
519 | db="OVN_NB"/> table to indicate that the VIF is now up. The CMS, | |
520 | if it uses this feature, can then react by allowing the VM's | |
521 | execution to proceed. | |
fe36184b BP |
522 | </li> |
523 | ||
524 | <li> | |
525 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 526 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 527 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
528 | the physical location of the logical port, so each instance updates the |
529 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
5868eb24 BP |
530 | DB <code>Logical_Flow</code> table) so that packets to and from the VIF |
531 | can be properly handled via tunnels. | |
fe36184b BP |
532 | </li> |
533 | ||
534 | <li> | |
535 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 536 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
537 | integration bridge. |
538 | </li> | |
539 | ||
540 | <li> | |
6eceebf5 | 541 | On the hypervisor where the VM was powered off, |
fe36184b | 542 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 543 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 544 | <code>Binding</code> table for the logical port. |
fe36184b BP |
545 | </li> |
546 | ||
547 | <li> | |
9fb4636f | 548 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 549 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
550 | for the logical port. This means that <code>ovn-controller</code> no |
551 | longer knows the physical location of the logical port, so each instance | |
552 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
553 | </li> |
554 | ||
555 | <li> | |
556 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
557 | anyone, an administrator deletes the VIF using the CMS user interface or | |
558 | API. The CMS updates its own configuration. | |
559 | </li> | |
560 | ||
561 | <li> | |
562 | The CMS plugin removes the VIF from the OVN Northbound database, | |
80f408f4 | 563 | by deleting its row in the <code>Logical_Switch_Port</code> table. |
fe36184b BP |
564 | </li> |
565 | ||
566 | <li> | |
91ae2065 | 567 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
568 | updates the OVN Southbound database accordingly, by removing or updating |
569 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
570 | and <code>Binding</code> table that were related to the now-destroyed | |
571 | VIF. | |
fe36184b BP |
572 | </li> |
573 | ||
574 | <li> | |
575 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 576 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
577 | in the previous step. <code>ovn-controller</code> updates OpenFlow |
578 | tables to reflect the update, although there may not be much to do, since | |
579 | the VIF had already become unreachable when it was removed from the | |
e387e3e8 | 580 | <code>Binding</code> table in a previous step. |
fe36184b BP |
581 | </li> |
582 | </ol> | |
583 | ||
a30b56d4 | 584 | <h2>Life Cycle of a Container Interface Inside a VM</h2> |
9fb4636f GS |
585 | |
586 | <p> | |
587 | OVN provides virtual network abstractions by converting information | |
588 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
589 | virtual networking for multi-tenants can only be provided if OVN controller | |
590 | is the only entity that can modify flows in Open vSwitch. When the | |
591 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
592 | fair assumption to make that tenant workloads running inside VMs cannot | |
593 | make any changes to Open vSwitch flows. | |
594 | </p> | |
595 | ||
596 | <p> | |
597 | If the infrastructure provider trusts the applications inside the | |
598 | containers not to break out and modify the Open vSwitch flows, then | |
599 | containers can be run in hypervisors. This is also the case when | |
600 | containers are run inside the VMs and Open vSwitch integration bridge | |
601 | with flows added by OVN controller resides in the same VM. For both | |
602 | the above cases, the workflow is the same as explained with an example | |
603 | in the previous section ("Life Cycle of a VIF"). | |
604 | </p> | |
605 | ||
606 | <p> | |
607 | This section talks about the life cycle of a container interface (CIF) | |
608 | when containers are created in the VMs and the Open vSwitch integration | |
609 | bridge resides inside the hypervisor. In this case, even if a container | |
610 | application breaks out, other tenants are not affected because the | |
611 | containers running inside the VMs cannot modify the flows in the | |
612 | Open vSwitch integration bridge. | |
613 | </p> | |
614 | ||
615 | <p> | |
616 | When multiple containers are created inside a VM, there are multiple | |
617 | CIFs associated with them. The network traffic associated with these | |
618 | CIFs need to reach the Open vSwitch integration bridge running in the | |
619 | hypervisor for OVN to support virtual network abstractions. OVN should | |
620 | also be able to distinguish network traffic coming from different CIFs. | |
621 | There are two ways to distinguish network traffic of CIFs. | |
622 | </p> | |
623 | ||
624 | <p> | |
625 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
626 | there could be a lot of network devices in the hypervisor. This would slow | |
627 | down OVS because of all the additional CPU cycles needed for the management | |
628 | of all the VIFs. It would also mean that the entity creating the | |
629 | containers in a VM should also be able to create the corresponding VIFs in | |
630 | the hypervisor. | |
631 | </p> | |
632 | ||
633 | <p> | |
634 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
635 | OVN could then distinguish network traffic coming from different CIFs via | |
636 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
637 | the tagging mechanism. | |
638 | </p> | |
639 | ||
640 | <ol> | |
641 | <li> | |
642 | A CIF's life cycle begins when a container is spawned inside a VM by | |
643 | the either the same CMS that created the VM or a tenant that owns that VM | |
644 | or even a container Orchestration System that is different than the CMS | |
645 | that initially created the VM. Whoever the entity is, it will need to | |
646 | know the <var>vif-id</var> that is associated with the network interface | |
647 | of the VM through which the container interface's network traffic is | |
648 | expected to go through. The entity that creates the container interface | |
649 | will also need to choose an unused VLAN inside that VM. | |
650 | </li> | |
651 | ||
652 | <li> | |
653 | The container spawning entity (either directly or through the CMS that | |
654 | manages the underlying infrastructure) updates the OVN Northbound | |
655 | database to include the new CIF, by adding a row to the | |
80f408f4 JP |
656 | <code>Logical_Switch_Port</code> table. In the new row, |
657 | <code>name</code> is any unique identifier, | |
658 | <code>parent_name</code> is the <var>vif-id</var> of the VM | |
659 | through which the CIF's network traffic is expected to go through | |
660 | and the <code>tag</code> is the VLAN tag that identifies the | |
9fb4636f GS |
661 | network traffic of that CIF. |
662 | </li> | |
663 | ||
664 | <li> | |
5868eb24 BP |
665 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
666 | turn, it makes the corresponding updates to the OVN Southbound database, | |
667 | by adding rows to the OVN Southbound database's <code>Logical_Flow</code> | |
668 | table to reflect the new port and also by creating a new row in the | |
669 | <code>Binding</code> table and populating all its columns except the | |
670 | column that identifies the <code>chassis</code>. | |
9fb4636f GS |
671 | </li> |
672 | ||
673 | <li> | |
674 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 675 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 676 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 677 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
678 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
679 | has that same value in <var>vif-id</var> in | |
2f4962f1 | 680 | <code>external_ids</code>:<code>iface-id</code> |
9fb4636f GS |
681 | updates the local hypervisor's OpenFlow tables so that packets to and |
682 | from the VIF with the particular VLAN <code>tag</code> are properly | |
683 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 684 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
685 | </li> |
686 | ||
687 | <li> | |
688 | One can only start the application inside the container after the | |
91ae2065 | 689 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 690 | notices the updated <code>chassis</code> column in <code>Binding</code> |
80f408f4 | 691 | table and updates the <ref column="up" table="Logical_Switch_Port" |
9fb4636f | 692 | db="OVN_NB"/> column in the OVN Northbound database's |
80f408f4 | 693 | <ref table="Logical_Switch_Port" db="OVN_NB"/> table to indicate that the |
9fb4636f GS |
694 | CIF is now up. The entity responsible to start the container application |
695 | queries this value and starts the application. | |
696 | </li> | |
697 | ||
698 | <li> | |
699 | Eventually the entity that created and started the container, stops it. | |
700 | The entity, through the CMS (or directly) deletes its row in the | |
80f408f4 | 701 | <code>Logical_Switch_Port</code> table. |
9fb4636f GS |
702 | </li> |
703 | ||
704 | <li> | |
91ae2065 | 705 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
706 | updates the OVN Southbound database accordingly, by removing or updating |
707 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
708 | that were related to the now-destroyed CIF. It also deletes the row in | |
709 | the <code>Binding</code> table for that CIF. | |
9fb4636f GS |
710 | </li> |
711 | ||
712 | <li> | |
713 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 BP |
714 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
715 | in the previous step. <code>ovn-controller</code> updates OpenFlow | |
716 | tables to reflect the update. | |
9fb4636f GS |
717 | </li> |
718 | </ol> | |
b705f9ea | 719 | |
69a832cf | 720 | <h2>Architectural Physical Life Cycle of a Packet</h2> |
b705f9ea | 721 | |
b705f9ea | 722 | <p> |
5868eb24 BP |
723 | This section describes how a packet travels from one virtual machine or |
724 | container to another through OVN. This description focuses on the physical | |
725 | treatment of a packet; for a description of the logical life cycle of a | |
726 | packet, please refer to the <code>Logical_Flow</code> table in | |
727 | <code>ovn-sb</code>(5). | |
b705f9ea JP |
728 | </p> |
729 | ||
5868eb24 BP |
730 | <p> |
731 | This section mentions several data and metadata fields, for clarity | |
732 | summarized here: | |
733 | </p> | |
734 | ||
735 | <dl> | |
736 | <dt>tunnel key</dt> | |
737 | <dd> | |
738 | When OVN encapsulates a packet in Geneve or another tunnel, it attaches | |
739 | extra data to it to allow the receiving OVN instance to process it | |
740 | correctly. This takes different forms depending on the particular | |
741 | encapsulation, but in each case we refer to it here as the ``tunnel | |
742 | key.'' See <code>Tunnel Encapsulations</code>, below, for details. | |
743 | </dd> | |
744 | ||
745 | <dt>logical datapath field</dt> | |
746 | <dd> | |
747 | A field that denotes the logical datapath through which a packet is being | |
4103f6d2 BP |
748 | processed. |
749 | <!-- Keep the following in sync with MFF_LOG_DATAPATH in | |
667e2b0b | 750 | ovn/lib/logical-fields.h. --> |
4103f6d2 BP |
751 | OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls |
752 | ``metadata'' to store the logical datapath. (This field is passed across | |
753 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
754 | </dd> |
755 | ||
756 | <dt>logical input port field</dt> | |
757 | <dd> | |
37910994 JP |
758 | <p> |
759 | A field that denotes the logical port from which the packet | |
760 | entered the logical datapath. | |
761 | <!-- Keep the following in sync with MFF_LOG_INPORT in | |
667e2b0b | 762 | ovn/lib/logical-fields.h. --> |
b221ff0d | 763 | OVN stores this in Open vSwitch extension register number 14. |
37910994 JP |
764 | </p> |
765 | ||
766 | <p> | |
767 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
768 | Although VXLAN tunnels do not explicitly carry a logical input port, | |
769 | OVN only uses VXLAN to communicate with gateways that from OVN's | |
770 | perspective consist of only a single logical port, so that OVN can set | |
771 | the logical input port field to this one on ingress to the OVN logical | |
772 | pipeline. | |
773 | </p> | |
5868eb24 BP |
774 | </dd> |
775 | ||
776 | <dt>logical output port field</dt> | |
777 | <dd> | |
37910994 JP |
778 | <p> |
779 | A field that denotes the logical port from which the packet will | |
780 | leave the logical datapath. This is initialized to 0 at the | |
781 | beginning of the logical ingress pipeline. | |
782 | <!-- Keep the following in sync with MFF_LOG_OUTPORT in | |
667e2b0b | 783 | ovn/lib/logical-fields.h. --> |
b221ff0d | 784 | OVN stores this in Open vSwitch extension register number 15. |
37910994 JP |
785 | </p> |
786 | ||
787 | <p> | |
788 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
789 | VXLAN tunnels do not transmit the logical output port field. | |
475f0a2c DB |
790 | Since VXLAN tunnels do not carry a logical output port field in |
791 | the tunnel key, when a packet is received from VXLAN tunnel by | |
00c875d0 | 792 | an OVN hypervisor, the packet is resubmitted to table 8 to |
475f0a2c DB |
793 | determine the output port(s); when the packet reaches table 32, |
794 | these packets are resubmitted to table 33 for local delivery by | |
795 | checking a MLF_RCV_FROM_VXLAN flag, which is set when the packet | |
796 | arrives from a VXLAN tunnel. | |
37910994 | 797 | </p> |
5868eb24 BP |
798 | </dd> |
799 | ||
3bd4ae23 | 800 | <dt>conntrack zone field for logical ports</dt> |
78aab811 | 801 | <dd> |
3bd4ae23 GS |
802 | A field that denotes the connection tracking zone for logical ports. |
803 | The value only has local significance and is not meaningful between | |
804 | chassis. This is initialized to 0 at the beginning of the logical | |
cc5e28d8 JP |
805 | <!-- Keep the following in sync with MFF_LOG_CT_ZONE in |
806 | ovn/lib/logical-fields.h. --> | |
b221ff0d | 807 | ingress pipeline. OVN stores this in Open vSwitch extension register |
cc5e28d8 | 808 | number 13. |
3bd4ae23 GS |
809 | </dd> |
810 | ||
06a26dd2 | 811 | <dt>conntrack zone fields for routers</dt> |
3bd4ae23 | 812 | <dd> |
06a26dd2 MS |
813 | Fields that denote the connection tracking zones for routers. These |
814 | values only have local significance and are not meaningful between | |
b221ff0d | 815 | chassis. OVN stores the zone information for DNATting in Open vSwitch |
cc5e28d8 JP |
816 | <!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and |
817 | MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. --> | |
b221ff0d JP |
818 | extension register number 11 and zone information for SNATing in |
819 | Open vSwitch extension register number 12. | |
78aab811 JP |
820 | </dd> |
821 | ||
bf143492 JP |
822 | <dt>logical flow flags</dt> |
823 | <dd> | |
475f0a2c DB |
824 | The logical flags are intended to handle keeping context between |
825 | tables in order to decide which rules in subsequent tables are | |
826 | matched. These values only have local significance and are not | |
827 | meaningful between chassis. OVN stores the logical flags in | |
bf143492 JP |
828 | <!-- Keep the following in sync with MFF_LOG_FLAGS in |
829 | ovn/lib/logical-fields.h. --> | |
475f0a2c | 830 | Open vSwitch extension register number 10. |
bf143492 JP |
831 | </dd> |
832 | ||
5868eb24 BP |
833 | <dt>VLAN ID</dt> |
834 | <dd> | |
835 | The VLAN ID is used as an interface between OVN and containers nested | |
836 | inside a VM (see <code>Life Cycle of a container interface inside a | |
837 | VM</code>, above, for more information). | |
838 | </dd> | |
839 | </dl> | |
840 | ||
841 | <p> | |
842 | Initially, a VM or container on the ingress hypervisor sends a packet on a | |
843 | port attached to the OVN integration bridge. Then: | |
844 | </p> | |
845 | ||
846 | <ol> | |
b705f9ea JP |
847 | <li> |
848 | <p> | |
5868eb24 BP |
849 | OpenFlow table 0 performs physical-to-logical translation. It matches |
850 | the packet's ingress port. Its actions annotate the packet with | |
851 | logical metadata, by setting the logical datapath field to identify the | |
852 | logical datapath that the packet is traversing and the logical input | |
00c875d0 | 853 | port field to identify the ingress port. Then it resubmits to table 8 |
5868eb24 BP |
854 | to enter the logical ingress pipeline. |
855 | </p> | |
856 | ||
857 | <p> | |
858 | Packets that originate from a container nested within a VM are treated | |
859 | in a slightly different way. The originating container can be | |
860 | distinguished based on the VIF-specific VLAN ID, so the | |
861 | physical-to-logical translation flows additionally match on VLAN ID and | |
862 | the actions strip the VLAN header. Following this step, OVN treats | |
863 | packets from containers just like any other packets. | |
864 | </p> | |
865 | ||
866 | <p> | |
867 | Table 0 also processes packets that arrive from other chassis. It | |
868 | distinguishes them from other packets by ingress port, which is a | |
869 | tunnel. As with packets just entering the OVN pipeline, the actions | |
870 | annotate these packets with logical datapath and logical ingress port | |
871 | metadata. In addition, the actions set the logical output port field, | |
872 | which is available because in OVN tunneling occurs after the logical | |
873 | output port is known. These three pieces of information are obtained | |
874 | from the tunnel encapsulation metadata (see <code>Tunnel | |
875 | Encapsulations</code> for encoding details). Then the actions resubmit | |
876 | to table 33 to enter the logical egress pipeline. | |
b705f9ea JP |
877 | </p> |
878 | </li> | |
879 | ||
880 | <li> | |
881 | <p> | |
00c875d0 | 882 | OpenFlow tables 8 through 31 execute the logical ingress pipeline from |
5868eb24 BP |
883 | the <code>Logical_Flow</code> table in the OVN Southbound database. |
884 | These tables are expressed entirely in terms of logical concepts like | |
885 | logical ports and logical datapaths. A big part of | |
886 | <code>ovn-controller</code>'s job is to translate them into equivalent | |
887 | OpenFlow (in particular it translates the table numbers: | |
00c875d0 | 888 | <code>Logical_Flow</code> tables 0 through 23 become OpenFlow tables 8 |
0bac7164 | 889 | through 31). |
b705f9ea | 890 | </p> |
5868eb24 | 891 | |
c80eac1f BP |
892 | <p> |
893 | Each logical flow maps to one or more OpenFlow flows. An actual packet | |
894 | ordinarily matches only one of these, although in some cases it can | |
895 | match more than one of these flows (which is not a problem because all | |
896 | of them have the same actions). <code>ovn-controller</code> uses the | |
897 | first 32 bits of the logical flow's UUID as the cookie for its OpenFlow | |
898 | flow or flows. (This is not necessarily unique, since the first 32 | |
899 | bits of a logical flow's UUID is not necessarily unique.) | |
900 | </p> | |
901 | ||
902 | <p> | |
903 | Some logical flows can map to the Open vSwitch ``conjunctive match'' | |
96fee5e0 | 904 | extension (see <code>ovs-fields</code>(7)). Flows with a |
c80eac1f BP |
905 | <code>conjunction</code> action use an OpenFlow cookie of 0, because |
906 | they can correspond to multiple logical flows. The OpenFlow flow for a | |
907 | conjunctive match includes a match on <code>conj_id</code>. | |
908 | </p> | |
909 | ||
910 | <p> | |
911 | Some logical flows may not be represented in the OpenFlow tables on a | |
912 | given hypervisor, if they could not be used on that hypervisor. For | |
913 | example, if no VIF in a logical switch resides on a given hypervisor, | |
914 | and the logical switch is not otherwise reachable on that hypervisor | |
915 | (e.g. over a series of hops through logical switches and routers | |
916 | starting from a VIF on the hypervisor), then the logical flow may not | |
917 | be represented there. | |
918 | </p> | |
919 | ||
0bac7164 BP |
920 | <p> |
921 | Most OVN actions have fairly obvious implementations in OpenFlow (with | |
922 | OVS extensions), e.g. <code>next;</code> is implemented as | |
923 | <code>resubmit</code>, <code><var>field</var> = | |
924 | <var>constant</var>;</code> as <code>set_field</code>. A few are worth | |
925 | describing in more detail: | |
926 | </p> | |
927 | ||
928 | <dl> | |
929 | <dt><code>output:</code></dt> | |
930 | <dd> | |
931 | Implemented by resubmitting the packet to table 32. If the pipeline | |
932 | executes more than one <code>output</code> action, then each one is | |
933 | separately resubmitted to table 32. This can be used to send | |
934 | multiple copies of the packet to multiple ports. (If the packet was | |
935 | not modified between the <code>output</code> actions, and some of the | |
936 | copies are destined to the same hypervisor, then using a logical | |
937 | multicast output port would save bandwidth between hypervisors.) | |
938 | </dd> | |
939 | ||
940 | <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt> | |
c34a87b6 | 941 | <dt><code>get_nd(<var>P</var>, <var>A</var>);</code></dt> |
0bac7164 BP |
942 | <dd> |
943 | <p> | |
944 | Implemented by storing arguments into OpenFlow fields, then | |
bf143492 | 945 | resubmitting to table 66, which <code>ovn-controller</code> |
0bac7164 BP |
946 | populates with flows generated from the <code>MAC_Binding</code> |
947 | table in the OVN Southbound database. If there is a match in table | |
bf143492 | 948 | 66, then its actions store the bound MAC in the Ethernet |
0bac7164 BP |
949 | destination address field. |
950 | </p> | |
951 | ||
952 | <p> | |
953 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
954 | the arguments, so that the OVN actions do not have to be aware of | |
955 | this temporary use.) | |
956 | </p> | |
957 | </dd> | |
958 | ||
959 | <dt><code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt> | |
c34a87b6 | 960 | <dt><code>put_nd(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt> |
0bac7164 BP |
961 | <dd> |
962 | <p> | |
963 | Implemented by storing the arguments into OpenFlow fields, then | |
964 | outputting a packet to <code>ovn-controller</code>, which updates | |
965 | the <code>MAC_Binding</code> table. | |
966 | </p> | |
967 | ||
968 | <p> | |
969 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
970 | the arguments, so that the OVN actions do not have to be aware of | |
971 | this temporary use.) | |
972 | </p> | |
973 | </dd> | |
974 | </dl> | |
b705f9ea JP |
975 | </li> |
976 | ||
977 | <li> | |
978 | <p> | |
5868eb24 BP |
979 | OpenFlow tables 32 through 47 implement the <code>output</code> action |
980 | in the logical ingress pipeline. Specifically, table 32 handles | |
981 | packets to remote hypervisors, table 33 handles packets to the local | |
bf143492 JP |
982 | hypervisor, and table 34 checks whether packets whose logical ingress |
983 | and egress port are the same should be discarded. | |
5868eb24 BP |
984 | </p> |
985 | ||
0b7da177 BP |
986 | <p> |
987 | Logical patch ports are a special case. Logical patch ports do not | |
988 | have a physical location and effectively reside on every hypervisor. | |
989 | Thus, flow table 33, for output to ports on the local hypervisor, | |
990 | naturally implements output to unicast logical patch ports too. | |
991 | However, applying the same logic to a logical patch port that is part | |
992 | of a logical multicast group yields packet duplication, because each | |
993 | hypervisor that contains a logical port in the multicast group will | |
994 | also output the packet to the logical patch port. Thus, multicast | |
995 | groups implement output to logical patch ports in table 32. | |
996 | </p> | |
997 | ||
5868eb24 BP |
998 | <p> |
999 | Each flow in table 32 matches on a logical output port for unicast or | |
1000 | multicast logical ports that include a logical port on a remote | |
1001 | hypervisor. Each flow's actions implement sending a packet to the port | |
1002 | it matches. For unicast logical output ports on remote hypervisors, | |
1003 | the actions set the tunnel key to the correct value, then send the | |
1004 | packet on the tunnel port to the correct hypervisor. (When the remote | |
1005 | hypervisor receives the packet, table 0 there will recognize it as a | |
1006 | tunneled packet and pass it along to table 33.) For multicast logical | |
1007 | output ports, the actions send one copy of the packet to each remote | |
1008 | hypervisor, in the same way as for unicast destinations. If a | |
1009 | multicast group includes a logical port or ports on the local | |
1010 | hypervisor, then its actions also resubmit to table 33. Table 32 also | |
2a38ef45 | 1011 | includes: |
5868eb24 BP |
1012 | </p> |
1013 | ||
2a38ef45 DA |
1014 | <ul> |
1015 | <li> | |
1016 | A higher-priority rule to match packets received from VXLAN tunnels, | |
1017 | based on flag MLF_RCV_FROM_VXLAN, and resubmit these packets to table | |
1018 | 33 for local delivery. Packets received from VXLAN tunnels reach | |
1019 | here because of a lack of logical output port field in the tunnel key | |
00c875d0 | 1020 | and thus these packets needed to be submitted to table 8 to |
2a38ef45 DA |
1021 | determine the output port. |
1022 | </li> | |
1023 | <li> | |
1024 | A higher-priority rule to match packets received from ports of type | |
1025 | <code>localport</code>, based on the logical input port, and resubmit | |
1026 | these packets to table 33 for local delivery. Ports of type | |
1027 | <code>localport</code> exist on every hypervisor and by definition | |
1028 | their traffic should never go out through a tunnel. | |
1029 | </li> | |
1030 | <li> | |
1031 | A fallback flow that resubmits to table 33 if there is no other | |
1032 | match. | |
1033 | </li> | |
1034 | </ul> | |
1035 | ||
5868eb24 BP |
1036 | <p> |
1037 | Flows in table 33 resemble those in table 32 but for logical ports that | |
0b7da177 | 1038 | reside locally rather than remotely. For unicast logical output ports |
5868eb24 BP |
1039 | on the local hypervisor, the actions just resubmit to table 34. For |
1040 | multicast output ports that include one or more logical ports on the | |
1041 | local hypervisor, for each such logical port <var>P</var>, the actions | |
1042 | change the logical output port to <var>P</var>, then resubmit to table | |
1043 | 34. | |
1044 | </p> | |
1045 | ||
6e6c3f91 HZ |
1046 | <p> |
1047 | A special case is that when a localnet port exists on the datapath, | |
1048 | remote port is connected by switching to the localnet port. In this | |
1049 | case, instead of adding a flow in table 32 to reach the remote port, a | |
1050 | flow is added in table 33 to switch the logical outport to the localnet | |
1051 | port, and resubmit to table 33 as if it were unicasted to a logical | |
1052 | port on the local hypervisor. | |
1053 | </p> | |
1054 | ||
5868eb24 BP |
1055 | <p> |
1056 | Table 34 matches and drops packets for which the logical input and | |
bf143492 | 1057 | output ports are the same and the MLF_ALLOW_LOOPBACK flag is not |
00c875d0 | 1058 | set. It resubmits other packets to table 40. |
b705f9ea JP |
1059 | </p> |
1060 | </li> | |
5868eb24 BP |
1061 | |
1062 | <li> | |
1063 | <p> | |
00c875d0 | 1064 | OpenFlow tables 40 through 63 execute the logical egress pipeline from |
5868eb24 BP |
1065 | the <code>Logical_Flow</code> table in the OVN Southbound database. |
1066 | The egress pipeline can perform a final stage of validation before | |
1067 | packet delivery. Eventually, it may execute an <code>output</code> | |
1068 | action, which <code>ovn-controller</code> implements by resubmitting to | |
1069 | table 64. A packet for which the pipeline never executes | |
1070 | <code>output</code> is effectively dropped (although it may have been | |
1071 | transmitted through a tunnel across a physical network). | |
1072 | </p> | |
1073 | ||
1074 | <p> | |
1075 | The egress pipeline cannot change the logical output port or cause | |
1076 | further tunneling. | |
1077 | </p> | |
1078 | </li> | |
1079 | ||
bf143492 JP |
1080 | <li> |
1081 | <p> | |
1082 | Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK is set. | |
1083 | Logical loopback was handled in table 34, but OpenFlow by default also | |
1084 | prevents loopback to the OpenFlow ingress port. Thus, when | |
1085 | MLF_ALLOW_LOOPBACK is set, OpenFlow table 64 saves the OpenFlow ingress | |
1086 | port, sets it to zero, resubmits to table 65 for logical-to-physical | |
1087 | transformation, and then restores the OpenFlow ingress port, | |
1088 | effectively disabling OpenFlow loopback prevents. When | |
1089 | MLF_ALLOW_LOOPBACK is unset, table 64 flow simply resubmits to table | |
1090 | 65. | |
1091 | </p> | |
1092 | </li> | |
1093 | ||
5868eb24 BP |
1094 | <li> |
1095 | <p> | |
bf143492 | 1096 | OpenFlow table 65 performs logical-to-physical translation, the |
5868eb24 BP |
1097 | opposite of table 0. It matches the packet's logical egress port. Its |
1098 | actions output the packet to the port attached to the OVN integration | |
1099 | bridge that represents that logical port. If the logical egress port | |
1100 | is a container nested with a VM, then before sending the packet the | |
1101 | actions push on a VLAN header with an appropriate VLAN ID. | |
1102 | </p> | |
1103 | </li> | |
1104 | </ol> | |
1105 | ||
3a77e831 MS |
1106 | <h2>Logical Routers and Logical Patch Ports</h2> |
1107 | ||
1108 | <p> | |
1109 | Typically logical routers and logical patch ports do not have a | |
1110 | physical location and effectively reside on every hypervisor. This is | |
1111 | the case for logical patch ports between logical routers and logical | |
1112 | switches behind those logical routers, to which VMs (and VIFs) attach. | |
1113 | </p> | |
1114 | ||
1115 | <p> | |
1116 | Consider a packet sent from one virtual machine or container to another | |
1117 | VM or container that resides on a different subnet. The packet will | |
1118 | traverse tables 0 to 65 as described in the previous section | |
1119 | <code>Architectural Physical Life Cycle of a Packet</code>, using the | |
1120 | logical datapath representing the logical switch that the sender is | |
1121 | attached to. At table 32, the packet will use the fallback flow that | |
1122 | resubmits locally to table 33 on the same hypervisor. In this case, | |
1123 | all of the processing from table 0 to table 65 occurs on the hypervisor | |
1124 | where the sender resides. | |
1125 | </p> | |
1126 | ||
1127 | <p> | |
1128 | When the packet reaches table 65, the logical egress port is a logical | |
1129 | patch port. The implementation in table 65 differs depending on the OVS | |
1130 | version, although the observed behavior is meant to be the same: | |
1131 | </p> | |
1132 | ||
1133 | <ul> | |
1134 | <li> | |
1135 | In OVS versions 2.6 and earlier, table 65 outputs to an OVS patch | |
1136 | port that represents the logical patch port. The packet re-enters | |
1137 | the OpenFlow flow table from the OVS patch port's peer in table 0, | |
1138 | which identifies the logical datapath and logical input port based | |
1139 | on the OVS patch port's OpenFlow port number. | |
1140 | </li> | |
1141 | ||
1142 | <li> | |
1143 | In OVS versions 2.7 and later, the packet is cloned and resubmitted | |
00c875d0 MS |
1144 | directly to the first OpenFlow flow table in the ingress pipeline, |
1145 | setting the logical ingress port to the peer logical patch port, and | |
1146 | using the peer logical patch port's logical datapath (that | |
1147 | represents the logical router). | |
3a77e831 MS |
1148 | </li> |
1149 | </ul> | |
1150 | ||
1151 | <p> | |
1152 | The packet re-enters the ingress pipeline in order to traverse tables | |
00c875d0 | 1153 | 8 to 65 again, this time using the logical datapath representing the |
3a77e831 MS |
1154 | logical router. The processing continues as described in the previous |
1155 | section <code>Architectural Physical Life Cycle of a Packet</code>. | |
1156 | When the packet reachs table 65, the logical egress port will once | |
1157 | again be a logical patch port. In the same manner as described above, | |
1158 | this logical patch port will cause the packet to be resubmitted to | |
00c875d0 | 1159 | OpenFlow tables 8 to 65, this time using the logical datapath |
3a77e831 MS |
1160 | representing the logical switch that the destination VM or container |
1161 | is attached to. | |
1162 | </p> | |
1163 | ||
1164 | <p> | |
00c875d0 | 1165 | The packet traverses tables 8 to 65 a third and final time. If the |
3a77e831 MS |
1166 | destination VM or container resides on a remote hypervisor, then table |
1167 | 32 will send the packet on a tunnel port from the sender's hypervisor | |
1168 | to the remote hypervisor. Finally table 65 will output the packet | |
1169 | directly to the destination VM or container. | |
1170 | </p> | |
1171 | ||
1172 | <p> | |
41a15b71 MS |
1173 | The following sections describe two exceptions, where logical routers |
1174 | and/or logical patch ports are associated with a physical location. | |
3a77e831 MS |
1175 | </p> |
1176 | ||
1177 | <h3>Gateway Routers</h3> | |
1178 | ||
1179 | <p> | |
1180 | A <dfn>gateway router</dfn> is a logical router that is bound to a | |
1181 | physical location. This includes all of the logical patch ports of | |
1182 | the logical router, as well as all of the peer logical patch ports on | |
1183 | logical switches. In the OVN Southbound database, the | |
1184 | <code>Port_Binding</code> entries for these logical patch ports use | |
1185 | the type <code>l3gateway</code> rather than <code>patch</code>, in | |
1186 | order to distinguish that these logical patch ports are bound to a | |
1187 | chassis. | |
1188 | </p> | |
1189 | ||
1190 | <p> | |
1191 | When a hypervisor processes a packet on a logical datapath | |
1192 | representing a logical switch, and the logical egress port is a | |
1193 | <code>l3gateway</code> port representing connectivity to a gateway | |
1194 | router, the packet will match a flow in table 32 that sends the | |
1195 | packet on a tunnel port to the chassis where the gateway router | |
1196 | resides. This processing in table 32 is done in the same manner as | |
1197 | for VIFs. | |
1198 | </p> | |
1199 | ||
1200 | <p> | |
1201 | Gateway routers are typically used in between distributed logical | |
1202 | routers and physical networks. The distributed logical router and | |
1203 | the logical switches behind it, to which VMs and containers attach, | |
1204 | effectively reside on each hypervisor. The distributed router and | |
1205 | the gateway router are connected by another logical switch, sometimes | |
1206 | referred to as a <code>join</code> logical switch. On the other | |
1207 | side, the gateway router connects to another logical switch that has | |
1208 | a localnet port connecting to the physical network. | |
1209 | </p> | |
1210 | ||
1211 | <p> | |
1212 | When using gateway routers, DNAT and SNAT rules are associated with | |
1213 | the gateway router, which provides a central location that can handle | |
1214 | one-to-many SNAT (aka IP masquerading). | |
1215 | </p> | |
1216 | ||
41a15b71 MS |
1217 | <h3>Distributed Gateway Ports</h3> |
1218 | ||
1219 | <p> | |
1220 | <dfn>Distributed gateway ports</dfn> are logical router patch ports | |
1221 | that directly connect distributed logical routers to logical | |
1222 | switches with localnet ports. | |
1223 | </p> | |
1224 | ||
1225 | <p> | |
1226 | The primary design goal of distributed gateway ports is to allow as | |
1227 | much traffic as possible to be handled locally on the hypervisor | |
1228 | where a VM or container resides. Whenever possible, packets from | |
1229 | the VM or container to the outside world should be processed | |
1230 | completely on that VM's or container's hypervisor, eventually | |
1231 | traversing a localnet port instance on that hypervisor to the | |
1232 | physical network. Whenever possible, packets from the outside | |
1233 | world to a VM or container should be directed through the physical | |
1234 | network directly to the VM's or container's hypervisor, where the | |
1235 | packet will enter the integration bridge through a localnet port. | |
1236 | </p> | |
1237 | ||
1238 | <p> | |
1239 | In order to allow for the distributed processing of packets | |
1240 | described in the paragraph above, distributed gateway ports need to | |
1241 | be logical patch ports that effectively reside on every hypervisor, | |
1242 | rather than <code>l3gateway</code> ports that are bound to a | |
1243 | particular chassis. However, the flows associated with distributed | |
1244 | gateway ports often need to be associated with physical locations, | |
1245 | for the following reasons: | |
1246 | </p> | |
1247 | ||
1248 | <ul> | |
1249 | <li> | |
1250 | <p> | |
1251 | The physical network that the localnet port is attached to | |
1252 | typically uses L2 learning. Any Ethernet address used over the | |
1253 | distributed gateway port must be restricted to a single physical | |
1254 | location so that upstream L2 learning is not confused. Traffic | |
1255 | sent out the distributed gateway port towards the localnet port | |
1256 | with a specific Ethernet address must be sent out one specific | |
1257 | instance of the distributed gateway port on one specific | |
1258 | chassis. Traffic received from the localnet port (or from a VIF | |
1259 | on the same logical switch as the localnet port) with a specific | |
1260 | Ethernet address must be directed to the logical switch's patch | |
1261 | port instance on that specific chassis. | |
1262 | </p> | |
1263 | ||
1264 | <p> | |
1265 | Due to the implications of L2 learning, the Ethernet address and | |
1266 | IP address of the distributed gateway port need to be restricted | |
1267 | to a single physical location. For this reason, the user must | |
1268 | specify one chassis associated with the distributed gateway | |
1269 | port. Note that traffic traversing the distributed gateway port | |
1270 | using other Ethernet addresses and IP addresses (e.g. one-to-one | |
1271 | NAT) is not restricted to this chassis. | |
1272 | </p> | |
1273 | ||
1274 | <p> | |
1275 | Replies to ARP and ND requests must be restricted to a single | |
1276 | physical location, where the Ethernet address in the reply | |
1277 | resides. This includes ARP and ND replies for the IP address | |
1278 | of the distributed gateway port, which are restricted to the | |
1279 | chassis that the user associated with the distributed gateway | |
1280 | port. | |
1281 | </p> | |
1282 | </li> | |
1283 | ||
1284 | <li> | |
1285 | In order to support one-to-many SNAT (aka IP masquerading), where | |
1286 | multiple logical IP addresses spread across multiple chassis are | |
1287 | mapped to a single external IP address, it will be necessary to | |
1288 | handle some of the logical router processing on a specific chassis | |
1289 | in a centralized manner. Since the SNAT external IP address is | |
1290 | typically the distributed gateway port IP address, and for | |
1291 | simplicity, the same chassis associated with the distributed | |
1292 | gateway port is used. | |
1293 | </li> | |
1294 | </ul> | |
1295 | ||
1296 | <p> | |
1297 | The details of flow restrictions to specific chassis are described | |
1298 | in the <code>ovn-northd</code> documentation. | |
1299 | </p> | |
1300 | ||
1301 | <p> | |
1302 | While most of the physical location dependent aspects of distributed | |
1303 | gateway ports can be handled by restricting some flows to specific | |
1304 | chassis, one additional mechanism is required. When a packet | |
1305 | leaves the ingress pipeline and the logical egress port is the | |
1306 | distributed gateway port, one of two different sets of actions is | |
1307 | required at table 32: | |
1308 | </p> | |
1309 | ||
1310 | <ul> | |
1311 | <li> | |
1312 | If the packet can be handled locally on the sender's hypervisor | |
1313 | (e.g. one-to-one NAT traffic), then the packet should just be | |
1314 | resubmitted locally to table 33, in the normal manner for | |
1315 | distributed logical patch ports. | |
1316 | </li> | |
1317 | ||
1318 | <li> | |
1319 | However, if the packet needs to be handled on the chassis | |
1320 | associated with the distributed gateway port (e.g. one-to-many | |
1321 | SNAT traffic or non-NAT traffic), then table 32 must send the | |
1322 | packet on a tunnel port to that chassis. | |
1323 | </li> | |
1324 | </ul> | |
1325 | ||
1326 | <p> | |
1327 | In order to trigger the second set of actions, the | |
1328 | <code>chassisredirect</code> type of southbound | |
1329 | <code>Port_Binding</code> has been added. Setting the logical | |
1330 | egress port to the type <code>chassisredirect</code> logical port is | |
1331 | simply a way to indicate that although the packet is destined for | |
1332 | the distributed gateway port, it needs to be redirected to a | |
1333 | different chassis. At table 32, packets with this logical egress | |
1334 | port are sent to a specific chassis, in the same way that table 32 | |
1335 | directs packets whose logical egress port is a VIF or a type | |
1336 | <code>l3gateway</code> port to different chassis. Once the packet | |
1337 | arrives at that chassis, table 33 resets the logical egress port to | |
1338 | the value representing the distributed gateway port. For each | |
1339 | distributed gateway port, there is one type | |
1340 | <code>chassisredirect</code> port, in addition to the distributed | |
1341 | logical patch port representing the distributed gateway port. | |
1342 | </p> | |
1343 | ||
88058f19 AW |
1344 | <h2>Life Cycle of a VTEP gateway</h2> |
1345 | ||
1346 | <p> | |
1347 | A gateway is a chassis that forwards traffic between the OVN-managed | |
1348 | part of a logical network and a physical VLAN, extending a | |
1349 | tunnel-based logical network into a physical network. | |
1350 | </p> | |
1351 | ||
1352 | <p> | |
1353 | The steps below refer often to details of the OVN and VTEP database | |
1354 | schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5) | |
1355 | and <code>vtep</code>(5), respectively, for the full story on these | |
1356 | databases. | |
1357 | </p> | |
1358 | ||
1359 | <ol> | |
1360 | <li> | |
1361 | A VTEP gateway's life cycle begins with the administrator registering | |
1362 | the VTEP gateway as a <code>Physical_Switch</code> table entry in the | |
1363 | <code>VTEP</code> database. The <code>ovn-controller-vtep</code> | |
1364 | connected to this VTEP database, will recognize the new VTEP gateway | |
1365 | and create a new <code>Chassis</code> table entry for it in the | |
1366 | <code>OVN_Southbound</code> database. | |
1367 | </li> | |
1368 | ||
1369 | <li> | |
1370 | The administrator can then create a new <code>Logical_Switch</code> | |
1371 | table entry, and bind a particular vlan on a VTEP gateway's port to | |
1372 | any VTEP logical switch. Once a VTEP logical switch is bound to | |
1373 | a VTEP gateway, the <code>ovn-controller-vtep</code> will detect | |
1374 | it and add its name to the <var>vtep_logical_switches</var> | |
1375 | column of the <code>Chassis</code> table in the <code> | |
1376 | OVN_Southbound</code> database. Note, the <var>tunnel_key</var> | |
1377 | column of VTEP logical switch is not filled at creation. The | |
1378 | <code>ovn-controller-vtep</code> will set the column when the | |
1379 | correponding vtep logical switch is bound to an OVN logical network. | |
1380 | </li> | |
1381 | ||
1382 | <li> | |
1383 | Now, the administrator can use the CMS to add a VTEP logical switch | |
1384 | to the OVN logical network. To do that, the CMS must first create a | |
80f408f4 | 1385 | new <code>Logical_Switch_Port</code> table entry in the <code> |
88058f19 AW |
1386 | OVN_Northbound</code> database. Then, the <var>type</var> column |
1387 | of this entry must be set to "vtep". Next, the <var> | |
1388 | vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys | |
1389 | in the <var>options</var> column must also be specified, since | |
1390 | multiple VTEP gateways can attach to the same VTEP logical switch. | |
1391 | </li> | |
1392 | ||
1393 | <li> | |
1394 | The newly created logical port in the <code>OVN_Northbound</code> | |
1395 | database and its configuration will be passed down to the <code> | |
1396 | OVN_Southbound</code> database as a new <code>Port_Binding</code> | |
1397 | table entry. The <code>ovn-controller-vtep</code> will recognize the | |
1398 | change and bind the logical port to the corresponding VTEP gateway | |
1399 | chassis. Configuration of binding the same VTEP logical switch to | |
1400 | a different OVN logical networks is not allowed and a warning will be | |
1401 | generated in the log. | |
1402 | </li> | |
1403 | ||
1404 | <li> | |
1405 | Beside binding to the VTEP gateway chassis, the <code> | |
1406 | ovn-controller-vtep</code> will update the <var>tunnel_key</var> | |
1407 | column of the VTEP logical switch to the corresponding <code> | |
1408 | Datapath_Binding</code> table entry's <var>tunnel_key</var> for the | |
1409 | bound OVN logical network. | |
1410 | </li> | |
1411 | ||
1412 | <li> | |
1413 | Next, the <code>ovn-controller-vtep</code> will keep reacting to the | |
1414 | configuration change in the <code>Port_Binding</code> in the | |
1415 | <code>OVN_Northbound</code> database, and updating the | |
1416 | <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database. | |
1417 | This allows the VTEP gateway to understand where to forward the unicast | |
1418 | traffic coming from the extended external network. | |
1419 | </li> | |
1420 | ||
1421 | <li> | |
1422 | Eventually, the VTEP gateway's life cycle ends when the administrator | |
1423 | unregisters the VTEP gateway from the <code>VTEP</code> database. | |
1424 | The <code>ovn-controller-vtep</code> will recognize the event and | |
1425 | remove all related configurations (<code>Chassis</code> table entry | |
1426 | and port bindings) in the <code>OVN_Southbound</code> database. | |
1427 | </li> | |
1428 | ||
1429 | <li> | |
1430 | When the <code>ovn-controller-vtep</code> is terminated, all related | |
1431 | configurations in the <code>OVN_Southbound</code> database and | |
1432 | the <code>VTEP</code> database will be cleaned, including | |
1433 | <code>Chassis</code> table entries for all registered VTEP gateways | |
1434 | and their port bindings, and all <code>Ucast_Macs_Remote</code> table | |
1435 | entries and the <code>Logical_Switch</code> tunnel keys. | |
1436 | </li> | |
1437 | </ol> | |
1438 | ||
75ddb5f4 LR |
1439 | <h1>Security</h1> |
1440 | ||
1441 | <h2>Role-Based Access Controls for the Soutbound DB</h2> | |
1442 | <p> | |
1443 | In order to provide additional security against the possibility of an OVN | |
1444 | chassis becoming compromised in such a way as to allow rogue software to | |
1445 | make arbitrary modifications to the southbound database state and thus | |
1446 | disrupt the OVN network, role-based access controls (see | |
1447 | <code>ovsdb-server(1)</code> for additional details) are provided for the | |
1448 | southbound database. | |
1449 | </p> | |
1450 | ||
1451 | <p> | |
1452 | The implementation of role-based access controls (RBAC) requires the | |
1453 | addition of two tables to an OVSDB schema: the <code>RBAC_Role</code> | |
1454 | table, which is indexed by role name and maps the the names of the various | |
1455 | tables that may be modifiable for a given role to individual rows in a | |
1456 | permissions table containing detailed permission information for that role, | |
1457 | and the permission table itself which consists of rows containing the | |
1458 | following information: | |
1459 | </p> | |
1460 | <dl> | |
1461 | <dt><code>Table Name</code></dt> | |
1462 | <dd> | |
1463 | The name of the associated table. This column exists primarily as an | |
1464 | aid for humans reading the contents of this table. | |
1465 | </dd> | |
1466 | ||
1467 | <dt><code>Auth Criteria</code></dt> | |
1468 | <dd> | |
1469 | A set of strings containing the names of columns (or column:key pairs | |
1470 | for columns containing string:string maps). The contents of at least | |
1471 | one of the columns or column:key values in a row to be modified, | |
1472 | inserted, or deleted must be equal to the ID of the client attempting | |
1473 | to act on the row in order for the authorization check to pass. If the | |
1474 | authorization criteria is empty, authorization checking is disabled and | |
1475 | all clients for the role will be treated as authorized. | |
1476 | </dd> | |
1477 | ||
1478 | <dt><code>Insert/Delete</code></dt> | |
1479 | <dd> | |
1480 | Row insertion/deletion permission; boolean value indicating whether | |
1481 | insertion and deletion of rows is allowed for the associated table. | |
1482 | If true, insertion and deletion of rows is allowed for authorized | |
1483 | clients. | |
1484 | </dd> | |
1485 | ||
1486 | <dt><code>Updatable Columns</code></dt> | |
1487 | <dd> | |
1488 | A set of strings containing the names of columns or column:key pairs | |
1489 | that may be updated or mutated by authorized clients. Modifications to | |
1490 | columns within a row are only permitted when the authorization check | |
1491 | for the client passes and all columns to be modified are included in | |
1492 | this set of modifiable columns. | |
1493 | </dd> | |
1494 | </dl> | |
1495 | ||
1496 | <p> | |
1497 | RBAC configuration for the OVN southbound database is maintained by | |
1498 | ovn-northd. With RBAC enabled, modifications are only permitted for the | |
1499 | <code>Chassis</code>, <code>Encap</code>, <code>Port_Binding</code>, and | |
1500 | <code>MAC_Binding</code> tables, and are resstricted as follows: | |
1501 | </p> | |
1502 | <dl> | |
1503 | <dt><code>Chassis</code></dt> | |
1504 | <dd> | |
1505 | <p> | |
1506 | <code>Authorization</code>: client ID must match the chassis name. | |
1507 | </p> | |
1508 | <p> | |
1509 | <code>Insert/Delete</code>: authorized row insertion and deletion | |
1510 | are permitted. | |
1511 | </p> | |
1512 | <p> | |
1513 | <code>Update</code>: The columns <code>nb_cfg</code>, | |
1514 | <code>external_ids</code>, <code>encaps</code>, and | |
1515 | <code>vtep_logical_switches</code> may be modified when authorized. | |
1516 | </p> | |
1517 | </dd> | |
1518 | ||
1519 | <dt><code>Encap</code></dt> | |
1520 | <dd> | |
1521 | <p> | |
1522 | <code>Authorization</code>: disabled (all clients are considered | |
1523 | to be authorized. Future: add a "creating chassis name" column to | |
1524 | this table and use it for authorization checking. | |
1525 | </p> | |
1526 | <p> | |
1527 | <code>Insert/Delete</code>: row insertion and row deletion | |
1528 | are permitted. | |
1529 | </p> | |
1530 | <p> | |
1531 | <code>Update</code>: The columns <code>type</code>, | |
1532 | <code>options</code>, and <code>ip</code> can be modified. | |
1533 | </p> | |
1534 | </dd> | |
1535 | ||
1536 | <dt><code>Port_Binding</code></dt> | |
1537 | <dd> | |
1538 | <p> | |
1539 | <code>Authorization</code>: disabled (all clients are considered | |
1540 | authorized. A future enhancement may add columns (or keys to | |
1541 | <code>external_ids</code>) in order to control which chassis are | |
1542 | allowed to bind each port. | |
1543 | </p> | |
1544 | <p> | |
1545 | <code>Insert/Delete</code>: row insertion/deletion are not permitted | |
1546 | (ovn-northd maintains rows in this table. | |
1547 | </p> | |
1548 | <p> | |
1549 | <code>Update</code>: Only modifications to the <code>chassis</code> | |
1550 | column are permitted. | |
1551 | </p> | |
1552 | </dd> | |
1553 | ||
1554 | <dt><code>MAC_Binding</code></dt> | |
1555 | <dd> | |
1556 | <p> | |
1557 | <code>Authorization</code>: disabled (all clients are considered | |
1558 | to be authorized). | |
1559 | </p> | |
1560 | <p> | |
1561 | <code>Insert/Delete</code>: row insertion/deletion are permitted. | |
1562 | </p> | |
1563 | <p> | |
1564 | <code>Update</code>: The columns <code>logical_port</code>, | |
1565 | <code>ip</code>, <code>mac</code>, and <code>datapath</code> may be | |
1566 | modified by ovn-controller. | |
1567 | </p> | |
1568 | </dd> | |
1569 | </dl> | |
1570 | ||
1571 | <p> | |
1572 | Enabling RBAC for ovn-controller connections to the southbound database | |
1573 | requires the following steps: | |
1574 | </p> | |
1575 | ||
1576 | <ol> | |
1577 | <li> | |
1578 | Creating SSL certificates for each chassis with the certificate CN field | |
1579 | set to the chassis name (e.g. for a chassis with | |
1580 | <code>external-ids:system-id=chassis-1</code>, via the command | |
1581 | "<code>ovs-pki -B 1024 -u req+sign chassis-1 switch</code>"). | |
1582 | </li> | |
1583 | <li> | |
1584 | Configuring each ovn-controller to use SSL when connecting to the | |
1585 | southbound database (e.g. via "<code>ovs-vsctl set open . | |
1586 | external-ids:ovn-remote=ssl:x.x.x.x:6642</code>"). | |
1587 | </li> | |
1588 | <li> | |
1589 | Configuring a southbound database SSL remote with "ovn-controller" role | |
1590 | (e.g. via "<code>ovn-sbctl set-connection role=ovn-controller | |
1591 | pssl:6642</code>"). | |
1592 | </li> | |
1593 | </ol> | |
1594 | ||
5868eb24 BP |
1595 | <h1>Design Decisions</h1> |
1596 | ||
1597 | <h2>Tunnel Encapsulations</h2> | |
1598 | ||
1599 | <p> | |
1600 | OVN annotates logical network packets that it sends from one hypervisor to | |
1601 | another with the following three pieces of metadata, which are encoded in | |
1602 | an encapsulation-specific fashion: | |
1603 | </p> | |
1604 | ||
1605 | <ul> | |
1606 | <li> | |
1607 | 24-bit logical datapath identifier, from the <code>tunnel_key</code> | |
1608 | column in the OVN Southbound <code>Datapath_Binding</code> table. | |
1609 | </li> | |
1610 | ||
1611 | <li> | |
1612 | 15-bit logical ingress port identifier. ID 0 is reserved for internal | |
1613 | use within OVN. IDs 1 through 32767, inclusive, may be assigned to | |
1614 | logical ports (see the <code>tunnel_key</code> column in the OVN | |
1615 | Southbound <code>Port_Binding</code> table). | |
1616 | </li> | |
1617 | ||
1618 | <li> | |
1619 | 16-bit logical egress port identifier. IDs 0 through 32767 have the same | |
1620 | meaning as for logical ingress ports. IDs 32768 through 65535, | |
1621 | inclusive, may be assigned to logical multicast groups (see the | |
1622 | <code>tunnel_key</code> column in the OVN Southbound | |
1623 | <code>Multicast_Group</code> table). | |
1624 | </li> | |
b705f9ea JP |
1625 | </ul> |
1626 | ||
1627 | <p> | |
5868eb24 BP |
1628 | For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT |
1629 | encapsulations, for the following reasons: | |
b705f9ea JP |
1630 | </p> |
1631 | ||
5868eb24 BP |
1632 | <ul> |
1633 | <li> | |
1634 | Only STT and Geneve support the large amounts of metadata (over 32 bits | |
1635 | per packet) that OVN uses (as described above). | |
1636 | </li> | |
1637 | ||
1638 | <li> | |
1639 | STT and Geneve use randomized UDP or TCP source ports that allows | |
1640 | efficient distribution among multiple paths in environments that use ECMP | |
1641 | in their underlay. | |
1642 | </li> | |
1643 | ||
1644 | <li> | |
1645 | NICs are available to offload STT and Geneve encapsulation and | |
1646 | decapsulation. | |
1647 | </li> | |
1648 | </ul> | |
1649 | ||
1650 | <p> | |
1651 | Due to its flexibility, the preferred encapsulation between hypervisors is | |
1652 | Geneve. For Geneve encapsulation, OVN transmits the logical datapath | |
1653 | identifier in the Geneve VNI. | |
1654 | ||
1655 | <!-- Keep the following in sync with ovn/controller/physical.h. --> | |
1656 | OVN transmits the logical ingress and logical egress ports in a TLV with | |
617609b8 | 1657 | class 0x0102, type 0x80, and a 32-bit value encoded as follows, from MSB to |
5868eb24 BP |
1658 | LSB: |
1659 | </p> | |
1660 | ||
1661 | <diagram> | |
1662 | <header name=""> | |
1663 | <bits name="rsv" above="1" below="0" width=".25"/> | |
1664 | <bits name="ingress port" above="15" width=".75"/> | |
1665 | <bits name="egress port" above="16" width=".75"/> | |
1666 | </header> | |
1667 | </diagram> | |
1668 | ||
1669 | <p> | |
1670 | Environments whose NICs lack Geneve offload may prefer STT encapsulation | |
1671 | for performance reasons. For STT encapsulation, OVN encodes all three | |
1672 | pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB | |
1673 | to LSB: | |
1674 | </p> | |
1675 | ||
1676 | <diagram> | |
1677 | <header name=""> | |
1678 | <bits name="reserved" above="9" below="0" width=".5"/> | |
1679 | <bits name="ingress port" above="15" width=".75"/> | |
1680 | <bits name="egress port" above="16" width=".75"/> | |
1681 | <bits name="datapath" above="24" width="1.25"/> | |
1682 | </header> | |
1683 | </diagram> | |
1684 | ||
b705f9ea | 1685 | <p> |
5868eb24 BP |
1686 | For connecting to gateways, in addition to Geneve and STT, OVN supports |
1687 | VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches. | |
1688 | Currently, gateways have a feature set that matches the capabilities as | |
1689 | defined by the VTEP schema, so fewer bits of metadata are necessary. In | |
1690 | the future, gateways that do not support encapsulations with large amounts | |
1691 | of metadata may continue to have a reduced feature set. | |
b705f9ea | 1692 | </p> |
fe36184b | 1693 | </manpage> |