]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-architecture" section="7" title="OVN Architecture"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-architecture -- Open Virtual Network architecture</p> | |
5 | ||
6 | <h1>Description</h1> | |
7 | ||
8 | <p> | |
9 | OVN, the Open Virtual Network, is a system to support virtual network | |
10 | abstraction. OVN complements the existing capabilities of OVS to add | |
11 | native support for virtual network abstractions, such as virtual L2 and L3 | |
12 | overlays and security groups. Services such as DHCP are also desirable | |
13 | features. Just like OVS, OVN's design goal is to have a production-quality | |
14 | implementation that can operate at significant scale. | |
15 | </p> | |
16 | ||
17 | <p> | |
18 | An OVN deployment consists of several components: | |
19 | </p> | |
20 | ||
21 | <ul> | |
22 | <li> | |
23 | <p> | |
24 | A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is | |
25 | OVN's ultimate client (via its users and administrators). OVN | |
26 | integration requires installing a CMS-specific plugin and | |
27 | related software (see below). OVN initially targets OpenStack | |
28 | as CMS. | |
29 | </p> | |
30 | ||
31 | <p> | |
32 | We generally speak of ``the'' CMS, but one can imagine scenarios in | |
33 | which multiple CMSes manage different parts of an OVN deployment. | |
34 | </p> | |
35 | </li> | |
36 | ||
37 | <li> | |
38 | An OVN Database physical or virtual node (or, eventually, cluster) | |
39 | installed in a central location. | |
40 | </li> | |
41 | ||
42 | <li> | |
43 | One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run | |
44 | Open vSwitch and implement the interface described in | |
2567fb84 | 45 | <code>IntegrationGuide.rst</code> in the OVS source tree. Any hypervisor |
fe36184b BP |
46 | platform supported by Open vSwitch is acceptable. |
47 | </li> | |
48 | ||
49 | <li> | |
50 | <p> | |
fa6aeaeb RB |
51 | Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based |
52 | logical network into a physical network by bidirectionally forwarding | |
53 | packets between tunnels and a physical Ethernet port. This allows | |
54 | non-virtualized machines to participate in logical networks. A gateway | |
55 | may be a physical host, a virtual machine, or an ASIC-based hardware | |
6355db7f | 56 | switch that supports the <code>vtep</code>(5) schema. |
fe36184b BP |
57 | </p> |
58 | ||
59 | <p> | |
fa6aeaeb RB |
60 | Hypervisors and gateways are together called <dfn>transport node</dfn> |
61 | or <dfn>chassis</dfn>. | |
fe36184b BP |
62 | </p> |
63 | </li> | |
64 | </ul> | |
65 | ||
66 | <p> | |
67 | The diagram below shows how the major components of OVN and related | |
68 | software interact. Starting at the top of the diagram, we have: | |
69 | </p> | |
70 | ||
71 | <ul> | |
72 | <li> | |
73 | The Cloud Management System, as defined above. | |
74 | </li> | |
75 | ||
76 | <li> | |
77 | <p> | |
fa6aeaeb RB |
78 | The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that |
79 | interfaces to OVN. In OpenStack, this is a Neutron plugin. | |
80 | The plugin's main purpose is to translate the CMS's notion of logical | |
81 | network configuration, stored in the CMS's configuration database in a | |
82 | CMS-specific format, into an intermediate representation understood by | |
83 | OVN. | |
fe36184b BP |
84 | </p> |
85 | ||
86 | <p> | |
fa6aeaeb RB |
87 | This component is necessarily CMS-specific, so a new plugin needs to be |
88 | developed for each CMS that is integrated with OVN. All of the | |
89 | components below this one in the diagram are CMS-independent. | |
fe36184b BP |
90 | </p> |
91 | </li> | |
92 | ||
93 | <li> | |
94 | <p> | |
fa6aeaeb RB |
95 | The <dfn>OVN Northbound Database</dfn> receives the intermediate |
96 | representation of logical network configuration passed down by the | |
97 | OVN/CMS Plugin. The database schema is meant to be ``impedance | |
98 | matched'' with the concepts used in a CMS, so that it directly supports | |
99 | notions of logical switches, routers, ACLs, and so on. See | |
5868eb24 | 100 | <code>ovn-nb</code>(5) for details. |
fe36184b BP |
101 | </p> |
102 | ||
103 | <p> | |
fa6aeaeb RB |
104 | The OVN Northbound Database has only two clients: the OVN/CMS Plugin |
105 | above it and <code>ovn-northd</code> below it. | |
fe36184b BP |
106 | </p> |
107 | </li> | |
108 | ||
109 | <li> | |
91ae2065 RB |
110 | <code>ovn-northd</code>(8) connects to the OVN Northbound Database |
111 | above it and the OVN Southbound Database below it. It translates the | |
ec78987f JP |
112 | logical network configuration in terms of conventional network |
113 | concepts, taken from the OVN Northbound Database, into logical | |
114 | datapath flows in the OVN Southbound Database below it. | |
fe36184b BP |
115 | </li> |
116 | ||
117 | <li> | |
118 | <p> | |
ec78987f | 119 | The <dfn>OVN Southbound Database</dfn> is the center of the system. |
91ae2065 | 120 | Its clients are <code>ovn-northd</code>(8) above it and |
ec78987f | 121 | <code>ovn-controller</code>(8) on every transport node below it. |
fe36184b BP |
122 | </p> |
123 | ||
124 | <p> | |
fa6aeaeb RB |
125 | The OVN Southbound Database contains three kinds of data: <dfn>Physical |
126 | Network</dfn> (PN) tables that specify how to reach hypervisor and | |
127 | other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the | |
128 | logical network in terms of ``logical datapath flows,'' and | |
129 | <dfn>Binding</dfn> tables that link logical network components' | |
130 | locations to the physical network. The hypervisors populate the PN and | |
dcda6e0d BP |
131 | Port_Binding tables, whereas <code>ovn-northd</code>(8) populates the |
132 | LN tables. | |
fe36184b BP |
133 | </p> |
134 | ||
135 | <p> | |
ec78987f JP |
136 | OVN Southbound Database performance must scale with the number of |
137 | transport nodes. This will likely require some work on | |
138 | <code>ovsdb-server</code>(1) as we encounter bottlenecks. | |
139 | Clustering for availability may be needed. | |
fe36184b BP |
140 | </p> |
141 | </li> | |
142 | </ul> | |
143 | ||
144 | <p> | |
145 | The remaining components are replicated onto each hypervisor: | |
146 | </p> | |
147 | ||
148 | <ul> | |
149 | <li> | |
150 | <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and | |
ec78987f JP |
151 | software gateway. Northbound, it connects to the OVN Southbound |
152 | Database to learn about OVN configuration and status and to | |
153 | populate the PN table and the <code>Chassis</code> column in | |
e387e3e8 | 154 | <code>Binding</code> table with the hypervisor's status. |
ec78987f JP |
155 | Southbound, it connects to <code>ovs-vswitchd</code>(8) as an |
156 | OpenFlow controller, for control over network traffic, and to the | |
157 | local <code>ovsdb-server</code>(1) to allow it to monitor and | |
158 | control Open vSwitch configuration. | |
fe36184b BP |
159 | </li> |
160 | ||
161 | <li> | |
162 | <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are | |
163 | conventional components of Open vSwitch. | |
164 | </li> | |
165 | </ul> | |
166 | ||
167 | <pre fixed="yes"> | |
168 | CMS | |
169 | | | |
170 | | | |
171 | +-----------|-----------+ | |
172 | | | | | |
173 | | OVN/CMS Plugin | | |
174 | | | | | |
175 | | | | | |
176 | | OVN Northbound DB | | |
177 | | | | | |
178 | | | | | |
91ae2065 | 179 | | ovn-northd | |
fe36184b BP |
180 | | | | |
181 | +-----------|-----------+ | |
182 | | | |
183 | | | |
ec78987f JP |
184 | +-------------------+ |
185 | | OVN Southbound DB | | |
186 | +-------------------+ | |
fe36184b BP |
187 | | |
188 | | | |
189 | +------------------+------------------+ | |
190 | | | | | |
ec78987f | 191 | HV 1 | | HV n | |
fe36184b BP |
192 | +---------------|---------------+ . +---------------|---------------+ |
193 | | | | . | | | | |
194 | | ovn-controller | . | ovn-controller | | |
195 | | | | | . | | | | | |
196 | | | | | | | | | | |
197 | | ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server | | |
198 | | | | | | |
199 | +-------------------------------+ +-------------------------------+ | |
200 | </pre> | |
201 | ||
fa183acc BP |
202 | <h2>Information Flow in OVN</h2> |
203 | ||
204 | <p> | |
205 | Configuration data in OVN flows from north to south. The CMS, through its | |
206 | OVN/CMS plugin, passes the logical network configuration to | |
207 | <code>ovn-northd</code> via the northbound database. In turn, | |
208 | <code>ovn-northd</code> compiles the configuration into a lower-level form | |
209 | and passes it to all of the chassis via the southbound database. | |
210 | </p> | |
211 | ||
212 | <p> | |
213 | Status information in OVN flows from south to north. OVN currently | |
214 | provides only a few forms of status information. First, | |
215 | <code>ovn-northd</code> populates the <code>up</code> column in the | |
216 | northbound <code>Logical_Switch_Port</code> table: if a logical port's | |
217 | <code>chassis</code> column in the southbound <code>Port_Binding</code> | |
218 | table is nonempty, it sets <code>up</code> to <code>true</code>, otherwise | |
219 | to <code>false</code>. This allows the CMS to detect when a VM's | |
220 | networking has come up. | |
221 | </p> | |
222 | ||
223 | <p> | |
224 | Second, OVN provides feedback to the CMS on the realization of its | |
225 | configuration, that is, whether the configuration provided by the CMS has | |
226 | taken effect. This feature requires the CMS to participate in a sequence | |
227 | number protocol, which works the following way: | |
228 | </p> | |
229 | ||
230 | <ol> | |
231 | <li> | |
232 | When the CMS updates the configuration in the northbound database, as | |
233 | part of the same transaction, it increments the value of the | |
234 | <code>nb_cfg</code> column in the <code>NB_Global</code> table. (This is | |
235 | only necessary if the CMS wants to know when the configuration has been | |
236 | realized.) | |
237 | </li> | |
238 | ||
239 | <li> | |
240 | When <code>ovn-northd</code> updates the southbound database based on a | |
241 | given snapshot of the northbound database, it copies <code>nb_cfg</code> | |
242 | from northbound <code>NB_Global</code> into the southbound database | |
243 | <code>SB_Global</code> table, as part of the same transaction. (Thus, an | |
244 | observer monitoring both databases can determine when the southbound | |
245 | database is caught up with the northbound.) | |
246 | </li> | |
247 | ||
248 | <li> | |
249 | After <code>ovn-northd</code> receives confirmation from the southbound | |
250 | database server that its changes have committed, it updates | |
251 | <code>sb_cfg</code> in the northbound <code>NB_Global</code> table to the | |
252 | <code>nb_cfg</code> version that was pushed down. (Thus, the CMS or | |
253 | another observer can determine when the southbound database is caught up | |
254 | without a connection to the southbound database.) | |
255 | </li> | |
256 | ||
257 | <li> | |
258 | The <code>ovn-controller</code> process on each chassis receives the | |
259 | updated southbound database, with the updated <code>nb_cfg</code>. This | |
260 | process in turn updates the physical flows installed in the chassis's | |
261 | Open vSwitch instances. When it receives confirmation from Open vSwitch | |
262 | that the physical flows have been updated, it updates <code>nb_cfg</code> | |
263 | in its own <code>Chassis</code> record in the southbound database. | |
264 | </li> | |
265 | ||
266 | <li> | |
267 | <code>ovn-northd</code> monitors the <code>nb_cfg</code> column in all of | |
268 | the <code>Chassis</code> records in the southbound database. It keeps | |
269 | track of the minimum value among all the records and copies it into the | |
270 | <code>hv_cfg</code> column in the northbound <code>NB_Global</code> | |
271 | table. (Thus, the CMS or another observer can determine when all of the | |
272 | hypervisors have caught up to the northbound configuration.) | |
273 | </li> | |
274 | </ol> | |
275 | ||
ca1564ec BP |
276 | <h2>Chassis Setup</h2> |
277 | ||
278 | <p> | |
279 | Each chassis in an OVN deployment must be configured with an Open vSwitch | |
280 | bridge dedicated for OVN's use, called the <dfn>integration bridge</dfn>. | |
e43fc07c RB |
281 | System startup scripts may create this bridge prior to starting |
282 | <code>ovn-controller</code> if desired. If this bridge does not exist when | |
283 | ovn-controller starts, it will be created automatically with the default | |
284 | configuration suggested below. The ports on the integration bridge include: | |
ca1564ec BP |
285 | </p> |
286 | ||
287 | <ul> | |
288 | <li> | |
289 | On any chassis, tunnel ports that OVN uses to maintain logical network | |
290 | connectivity. <code>ovn-controller</code> adds, updates, and removes | |
291 | these tunnel ports. | |
292 | </li> | |
293 | ||
294 | <li> | |
295 | On a hypervisor, any VIFs that are to be attached to logical networks. | |
296 | The hypervisor itself, or the integration between Open vSwitch and the | |
2567fb84 | 297 | hypervisor (described in <code>IntegrationGuide.rst</code>) takes care of |
ca1564ec BP |
298 | this. (This is not part of OVN or new to OVN; this is pre-existing |
299 | integration work that has already been done on hypervisors that support | |
300 | OVS.) | |
301 | </li> | |
302 | ||
303 | <li> | |
304 | On a gateway, the physical port used for logical network connectivity. | |
305 | System startup scripts add this port to the bridge prior to starting | |
306 | <code>ovn-controller</code>. This can be a patch port to another bridge, | |
307 | instead of a physical port, in more sophisticated setups. | |
308 | </li> | |
309 | </ul> | |
310 | ||
311 | <p> | |
312 | Other ports should not be attached to the integration bridge. In | |
313 | particular, physical ports attached to the underlay network (as opposed to | |
314 | gateway ports, which are physical ports attached to logical networks) must | |
315 | not be attached to the integration bridge. Underlay physical ports should | |
316 | instead be attached to a separate Open vSwitch bridge (they need not be | |
317 | attached to any bridge at all, in fact). | |
318 | </p> | |
319 | ||
320 | <p> | |
a42226f0 BP |
321 | The integration bridge should be configured as described below. |
322 | The effect of each of these settings is documented in | |
323 | <code>ovs-vswitchd.conf.db</code>(5): | |
ca1564ec BP |
324 | </p> |
325 | ||
e43fc07c RB |
326 | <!-- Keep the following in sync with create_br_int() in |
327 | ovn/controller/ovn-controller.c. --> | |
a42226f0 BP |
328 | <dl> |
329 | <dt><code>fail-mode=secure</code></dt> | |
330 | <dd> | |
331 | Avoids switching packets between isolated logical networks before | |
332 | <code>ovn-controller</code> starts up. See <code>Controller Failure | |
333 | Settings</code> in <code>ovs-vsctl</code>(8) for more information. | |
334 | </dd> | |
335 | ||
336 | <dt><code>other-config:disable-in-band=true</code></dt> | |
337 | <dd> | |
338 | Suppresses in-band control flows for the integration bridge. It would be | |
339 | unusual for such flows to show up anyway, because OVN uses a local | |
340 | controller (over a Unix domain socket) instead of a remote controller. | |
341 | It's possible, however, for some other bridge in the same system to have | |
342 | an in-band remote controller, and in that case this suppresses the flows | |
7c9afefd SF |
343 | that in-band control would ordinarily set up. Refer to the documentation |
344 | for more information. | |
a42226f0 BP |
345 | </dd> |
346 | </dl> | |
347 | ||
ca1564ec BP |
348 | <p> |
349 | The customary name for the integration bridge is <code>br-int</code>, but | |
350 | another name may be used. | |
351 | </p> | |
352 | ||
747b2a45 BP |
353 | <h2>Logical Networks</h2> |
354 | ||
355 | <p> | |
356 | A <dfn>logical network</dfn> implements the same concepts as physical | |
357 | networks, but they are insulated from the physical network with tunnels or | |
358 | other encapsulations. This allows logical networks to have separate IP and | |
359 | other address spaces that overlap, without conflicting, with those used for | |
360 | physical networks. Logical network topologies can be arranged without | |
361 | regard for the topologies of the physical networks on which they run. | |
362 | </p> | |
363 | ||
364 | <p> | |
365 | Logical network concepts in OVN include: | |
366 | </p> | |
367 | ||
368 | <ul> | |
369 | <li> | |
370 | <dfn>Logical switches</dfn>, the logical version of Ethernet switches. | |
371 | </li> | |
372 | ||
373 | <li> | |
374 | <dfn>Logical routers</dfn>, the logical version of IP routers. Logical | |
375 | switches and routers can be connected into sophisticated topologies. | |
376 | </li> | |
377 | ||
378 | <li> | |
379 | <dfn>Logical datapaths</dfn> are the logical version of an OpenFlow | |
380 | switch. Logical switches and routers are both implemented as logical | |
381 | datapaths. | |
382 | </li> | |
3a77e831 MS |
383 | |
384 | <li> | |
385 | <p> | |
386 | <dfn>Logical ports</dfn> represent the points of connectivity in and | |
387 | out of logical switches and logical routers. Some common types of | |
388 | logical ports are: | |
389 | </p> | |
390 | ||
391 | <ul> | |
392 | <li> | |
393 | Logical ports representing VIFs. | |
394 | </li> | |
395 | ||
396 | <li> | |
397 | <dfn>Localnet ports</dfn> represent the points of connectivity | |
398 | between logical switches and the physical network. They are | |
399 | implemented as OVS patch ports between the integration bridge | |
400 | and the separate Open vSwitch bridge that underlay physical | |
401 | ports attach to. | |
402 | </li> | |
403 | ||
404 | <li> | |
405 | <dfn>Logical patch ports</dfn> represent the points of | |
406 | connectivity between logical switches and logical routers, and | |
407 | in some cases between peer logical routers. There is a pair of | |
408 | logical patch ports at each such point of connectivity, one on | |
409 | each side. | |
410 | </li> | |
2a38ef45 DA |
411 | <li> |
412 | <dfn>Localport ports</dfn> represent the points of local | |
413 | connectivity between logical switches and VIFs. These ports are | |
414 | present in every chassis (not bound to any particular one) and | |
415 | traffic from them will never go through a tunnel. A | |
416 | <code>localport</code> is expected to only generate traffic destined | |
417 | for a local destination, typically in response to a request it | |
418 | received. | |
419 | One use case is how OpenStack Neutron uses a <code>localport</code> | |
420 | port for serving metadata to VM's residing on every hypervisor. A | |
421 | metadata proxy process is attached to this port on every host and all | |
422 | VM's within the same network will reach it at the same IP/MAC address | |
423 | without any traffic being sent over a tunnel. Further details can be | |
424 | seen at https://docs.openstack.org/developer/networking-ovn/design/metadata_api.html. | |
425 | </li> | |
3a77e831 MS |
426 | </ul> |
427 | </li> | |
747b2a45 BP |
428 | </ul> |
429 | ||
ca1564ec | 430 | <h2>Life Cycle of a VIF</h2> |
fe36184b BP |
431 | |
432 | <p> | |
433 | Tables and their schemas presented in isolation are difficult to | |
434 | understand. Here's an example. | |
435 | </p> | |
436 | ||
9fb4636f GS |
437 | <p> |
438 | A VIF on a hypervisor is a virtual network interface attached either | |
439 | to a VM or a container running directly on that hypervisor (This is | |
440 | different from the interface of a container running inside a VM). | |
441 | </p> | |
442 | ||
fe36184b BP |
443 | <p> |
444 | The steps in this example refer often to details of the OVN and OVN | |
ec78987f | 445 | Northbound database schemas. Please see <code>ovn-sb</code>(5) and |
fe36184b BP |
446 | <code>ovn-nb</code>(5), respectively, for the full story on these |
447 | databases. | |
448 | </p> | |
449 | ||
450 | <ol> | |
451 | <li> | |
452 | A VIF's life cycle begins when a CMS administrator creates a new VIF | |
453 | using the CMS user interface or API and adds it to a switch (one | |
454 | implemented by OVN as a logical switch). The CMS updates its own | |
455 | configuration. This includes associating unique, persistent identifier | |
456 | <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF. | |
457 | </li> | |
458 | ||
459 | <li> | |
460 | The CMS plugin updates the OVN Northbound database to include the new | |
80f408f4 JP |
461 | VIF, by adding a row to the <code>Logical_Switch_Port</code> |
462 | table. In the new row, <code>name</code> is <var>vif-id</var>, | |
463 | <code>mac</code> is <var>mac</var>, <code>switch</code> points to | |
464 | the OVN logical switch's Logical_Switch record, and other columns | |
465 | are initialized appropriately. | |
fe36184b BP |
466 | </li> |
467 | ||
468 | <li> | |
5868eb24 BP |
469 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
470 | turn, it makes the corresponding updates to the OVN Southbound database, | |
471 | by adding rows to the OVN Southbound database <code>Logical_Flow</code> | |
472 | table to reflect the new port, e.g. add a flow to recognize that packets | |
473 | destined to the new port's MAC address should be delivered to it, and | |
474 | update the flow that delivers broadcast and multicast packets to include | |
475 | the new port. It also creates a record in the <code>Binding</code> table | |
476 | and populates all its columns except the column that identifies the | |
9fb4636f | 477 | <code>chassis</code>. |
fe36184b BP |
478 | </li> |
479 | ||
480 | <li> | |
481 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 482 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
483 | in the previous step. As long as the VM that owns the VIF is powered |
484 | off, <code>ovn-controller</code> cannot do much; it cannot, for example, | |
fe36184b BP |
485 | arrange to send packets to or receive packets from the VIF, because the |
486 | VIF does not actually exist anywhere. | |
487 | </li> | |
488 | ||
489 | <li> | |
490 | Eventually, a user powers on the VM that owns the VIF. On the hypervisor | |
491 | where the VM is powered on, the integration between the hypervisor and | |
2567fb84 | 492 | Open vSwitch (described in <code>IntegrationGuide.rst</code>) adds the VIF |
fe36184b | 493 | to the OVN integration bridge and stores <var>vif-id</var> in |
2f4962f1 | 494 | <code>external_ids</code>:<code>iface-id</code> to indicate that the |
fe36184b BP |
495 | interface is an instantiation of the new VIF. (None of this code is new |
496 | in OVN; this is pre-existing integration work that has already been done | |
497 | on hypervisors that support OVS.) | |
498 | </li> | |
499 | ||
500 | <li> | |
501 | On the hypervisor where the VM is powered on, <code>ovn-controller</code> | |
2f4962f1 | 502 | notices <code>external_ids</code>:<code>iface-id</code> in the new |
968353c2 | 503 | Interface. In response, in the OVN Southbound DB, it updates the |
e387e3e8 | 504 | <code>Binding</code> table's <code>chassis</code> column for the |
2f4962f1 | 505 | row that links the logical port from <code>external_ids</code>:<code> |
968353c2 HK |
506 | iface-id</code> to the hypervisor. Afterward, <code>ovn-controller</code> |
507 | updates the local hypervisor's OpenFlow tables so that packets to and from | |
508 | the VIF are properly handled. | |
fe36184b BP |
509 | </li> |
510 | ||
511 | <li> | |
512 | Some CMS systems, including OpenStack, fully start a VM only when its | |
91ae2065 RB |
513 | networking is ready. To support this, <code>ovn-northd</code> notices |
514 | the <code>chassis</code> column updated for the row in | |
e387e3e8 | 515 | <code>Binding</code> table and pushes this upward by updating the |
80f408f4 JP |
516 | <ref column="up" table="Logical_Switch_Port" db="OVN_NB"/> column |
517 | in the OVN Northbound database's <ref table="Logical_Switch_Port" | |
518 | db="OVN_NB"/> table to indicate that the VIF is now up. The CMS, | |
519 | if it uses this feature, can then react by allowing the VM's | |
520 | execution to proceed. | |
fe36184b BP |
521 | </li> |
522 | ||
523 | <li> | |
524 | On every hypervisor but the one where the VIF resides, | |
9fb4636f | 525 | <code>ovn-controller</code> notices the completely populated row in the |
e387e3e8 | 526 | <code>Binding</code> table. This provides <code>ovn-controller</code> |
fe36184b BP |
527 | the physical location of the logical port, so each instance updates the |
528 | OpenFlow tables of its switch (based on logical datapath flows in the OVN | |
5868eb24 BP |
529 | DB <code>Logical_Flow</code> table) so that packets to and from the VIF |
530 | can be properly handled via tunnels. | |
fe36184b BP |
531 | </li> |
532 | ||
533 | <li> | |
534 | Eventually, a user powers off the VM that owns the VIF. On the | |
6eceebf5 | 535 | hypervisor where the VM was powered off, the VIF is deleted from the OVN |
fe36184b BP |
536 | integration bridge. |
537 | </li> | |
538 | ||
539 | <li> | |
6eceebf5 | 540 | On the hypervisor where the VM was powered off, |
fe36184b | 541 | <code>ovn-controller</code> notices that the VIF was deleted. In |
9fb4636f | 542 | response, it removes the <code>Chassis</code> column content in the |
e387e3e8 | 543 | <code>Binding</code> table for the logical port. |
fe36184b BP |
544 | </li> |
545 | ||
546 | <li> | |
9fb4636f | 547 | On every hypervisor, <code>ovn-controller</code> notices the empty |
e387e3e8 | 548 | <code>Chassis</code> column in the <code>Binding</code> table's row |
9fb4636f GS |
549 | for the logical port. This means that <code>ovn-controller</code> no |
550 | longer knows the physical location of the logical port, so each instance | |
551 | updates its OpenFlow table to reflect that. | |
fe36184b BP |
552 | </li> |
553 | ||
554 | <li> | |
555 | Eventually, when the VIF (or its entire VM) is no longer needed by | |
556 | anyone, an administrator deletes the VIF using the CMS user interface or | |
557 | API. The CMS updates its own configuration. | |
558 | </li> | |
559 | ||
560 | <li> | |
561 | The CMS plugin removes the VIF from the OVN Northbound database, | |
80f408f4 | 562 | by deleting its row in the <code>Logical_Switch_Port</code> table. |
fe36184b BP |
563 | </li> |
564 | ||
565 | <li> | |
91ae2065 | 566 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
567 | updates the OVN Southbound database accordingly, by removing or updating |
568 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
569 | and <code>Binding</code> table that were related to the now-destroyed | |
570 | VIF. | |
fe36184b BP |
571 | </li> |
572 | ||
573 | <li> | |
574 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 | 575 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
5868eb24 BP |
576 | in the previous step. <code>ovn-controller</code> updates OpenFlow |
577 | tables to reflect the update, although there may not be much to do, since | |
578 | the VIF had already become unreachable when it was removed from the | |
e387e3e8 | 579 | <code>Binding</code> table in a previous step. |
fe36184b BP |
580 | </li> |
581 | </ol> | |
582 | ||
a30b56d4 | 583 | <h2>Life Cycle of a Container Interface Inside a VM</h2> |
9fb4636f GS |
584 | |
585 | <p> | |
586 | OVN provides virtual network abstractions by converting information | |
587 | written in OVN_NB database to OpenFlow flows in each hypervisor. Secure | |
588 | virtual networking for multi-tenants can only be provided if OVN controller | |
589 | is the only entity that can modify flows in Open vSwitch. When the | |
590 | Open vSwitch integration bridge resides in the hypervisor, it is a | |
591 | fair assumption to make that tenant workloads running inside VMs cannot | |
592 | make any changes to Open vSwitch flows. | |
593 | </p> | |
594 | ||
595 | <p> | |
596 | If the infrastructure provider trusts the applications inside the | |
597 | containers not to break out and modify the Open vSwitch flows, then | |
598 | containers can be run in hypervisors. This is also the case when | |
599 | containers are run inside the VMs and Open vSwitch integration bridge | |
600 | with flows added by OVN controller resides in the same VM. For both | |
601 | the above cases, the workflow is the same as explained with an example | |
602 | in the previous section ("Life Cycle of a VIF"). | |
603 | </p> | |
604 | ||
605 | <p> | |
606 | This section talks about the life cycle of a container interface (CIF) | |
607 | when containers are created in the VMs and the Open vSwitch integration | |
608 | bridge resides inside the hypervisor. In this case, even if a container | |
609 | application breaks out, other tenants are not affected because the | |
610 | containers running inside the VMs cannot modify the flows in the | |
611 | Open vSwitch integration bridge. | |
612 | </p> | |
613 | ||
614 | <p> | |
615 | When multiple containers are created inside a VM, there are multiple | |
616 | CIFs associated with them. The network traffic associated with these | |
617 | CIFs need to reach the Open vSwitch integration bridge running in the | |
618 | hypervisor for OVN to support virtual network abstractions. OVN should | |
619 | also be able to distinguish network traffic coming from different CIFs. | |
620 | There are two ways to distinguish network traffic of CIFs. | |
621 | </p> | |
622 | ||
623 | <p> | |
624 | One way is to provide one VIF for every CIF (1:1 model). This means that | |
625 | there could be a lot of network devices in the hypervisor. This would slow | |
626 | down OVS because of all the additional CPU cycles needed for the management | |
627 | of all the VIFs. It would also mean that the entity creating the | |
628 | containers in a VM should also be able to create the corresponding VIFs in | |
629 | the hypervisor. | |
630 | </p> | |
631 | ||
632 | <p> | |
633 | The second way is to provide a single VIF for all the CIFs (1:many model). | |
634 | OVN could then distinguish network traffic coming from different CIFs via | |
635 | a tag written in every packet. OVN uses this mechanism and uses VLAN as | |
636 | the tagging mechanism. | |
637 | </p> | |
638 | ||
639 | <ol> | |
640 | <li> | |
641 | A CIF's life cycle begins when a container is spawned inside a VM by | |
642 | the either the same CMS that created the VM or a tenant that owns that VM | |
643 | or even a container Orchestration System that is different than the CMS | |
644 | that initially created the VM. Whoever the entity is, it will need to | |
645 | know the <var>vif-id</var> that is associated with the network interface | |
646 | of the VM through which the container interface's network traffic is | |
647 | expected to go through. The entity that creates the container interface | |
648 | will also need to choose an unused VLAN inside that VM. | |
649 | </li> | |
650 | ||
651 | <li> | |
652 | The container spawning entity (either directly or through the CMS that | |
653 | manages the underlying infrastructure) updates the OVN Northbound | |
654 | database to include the new CIF, by adding a row to the | |
80f408f4 JP |
655 | <code>Logical_Switch_Port</code> table. In the new row, |
656 | <code>name</code> is any unique identifier, | |
657 | <code>parent_name</code> is the <var>vif-id</var> of the VM | |
658 | through which the CIF's network traffic is expected to go through | |
659 | and the <code>tag</code> is the VLAN tag that identifies the | |
9fb4636f GS |
660 | network traffic of that CIF. |
661 | </li> | |
662 | ||
663 | <li> | |
5868eb24 BP |
664 | <code>ovn-northd</code> receives the OVN Northbound database update. In |
665 | turn, it makes the corresponding updates to the OVN Southbound database, | |
666 | by adding rows to the OVN Southbound database's <code>Logical_Flow</code> | |
667 | table to reflect the new port and also by creating a new row in the | |
668 | <code>Binding</code> table and populating all its columns except the | |
669 | column that identifies the <code>chassis</code>. | |
9fb4636f GS |
670 | </li> |
671 | ||
672 | <li> | |
673 | On every hypervisor, <code>ovn-controller</code> subscribes to the | |
e387e3e8 | 674 | changes in the <code>Binding</code> table. When a new row is created |
91ae2065 | 675 | by <code>ovn-northd</code> that includes a value in |
e387e3e8 | 676 | <code>parent_port</code> column of <code>Binding</code> table, the |
91ae2065 RB |
677 | <code>ovn-controller</code> in the hypervisor whose OVN integration bridge |
678 | has that same value in <var>vif-id</var> in | |
2f4962f1 | 679 | <code>external_ids</code>:<code>iface-id</code> |
9fb4636f GS |
680 | updates the local hypervisor's OpenFlow tables so that packets to and |
681 | from the VIF with the particular VLAN <code>tag</code> are properly | |
682 | handled. Afterward it updates the <code>chassis</code> column of | |
e387e3e8 | 683 | the <code>Binding</code> to reflect the physical location. |
9fb4636f GS |
684 | </li> |
685 | ||
686 | <li> | |
687 | One can only start the application inside the container after the | |
91ae2065 | 688 | underlying network is ready. To support this, <code>ovn-northd</code> |
e387e3e8 | 689 | notices the updated <code>chassis</code> column in <code>Binding</code> |
80f408f4 | 690 | table and updates the <ref column="up" table="Logical_Switch_Port" |
9fb4636f | 691 | db="OVN_NB"/> column in the OVN Northbound database's |
80f408f4 | 692 | <ref table="Logical_Switch_Port" db="OVN_NB"/> table to indicate that the |
9fb4636f GS |
693 | CIF is now up. The entity responsible to start the container application |
694 | queries this value and starts the application. | |
695 | </li> | |
696 | ||
697 | <li> | |
698 | Eventually the entity that created and started the container, stops it. | |
699 | The entity, through the CMS (or directly) deletes its row in the | |
80f408f4 | 700 | <code>Logical_Switch_Port</code> table. |
9fb4636f GS |
701 | </li> |
702 | ||
703 | <li> | |
91ae2065 | 704 | <code>ovn-northd</code> receives the OVN Northbound update and in turn |
5868eb24 BP |
705 | updates the OVN Southbound database accordingly, by removing or updating |
706 | the rows from the OVN Southbound database <code>Logical_Flow</code> table | |
707 | that were related to the now-destroyed CIF. It also deletes the row in | |
708 | the <code>Binding</code> table for that CIF. | |
9fb4636f GS |
709 | </li> |
710 | ||
711 | <li> | |
712 | On every hypervisor, <code>ovn-controller</code> receives the | |
48605550 BP |
713 | <code>Logical_Flow</code> table updates that <code>ovn-northd</code> made |
714 | in the previous step. <code>ovn-controller</code> updates OpenFlow | |
715 | tables to reflect the update. | |
9fb4636f GS |
716 | </li> |
717 | </ol> | |
b705f9ea | 718 | |
69a832cf | 719 | <h2>Architectural Physical Life Cycle of a Packet</h2> |
b705f9ea | 720 | |
b705f9ea | 721 | <p> |
5868eb24 BP |
722 | This section describes how a packet travels from one virtual machine or |
723 | container to another through OVN. This description focuses on the physical | |
724 | treatment of a packet; for a description of the logical life cycle of a | |
725 | packet, please refer to the <code>Logical_Flow</code> table in | |
726 | <code>ovn-sb</code>(5). | |
b705f9ea JP |
727 | </p> |
728 | ||
5868eb24 BP |
729 | <p> |
730 | This section mentions several data and metadata fields, for clarity | |
731 | summarized here: | |
732 | </p> | |
733 | ||
734 | <dl> | |
735 | <dt>tunnel key</dt> | |
736 | <dd> | |
737 | When OVN encapsulates a packet in Geneve or another tunnel, it attaches | |
738 | extra data to it to allow the receiving OVN instance to process it | |
739 | correctly. This takes different forms depending on the particular | |
740 | encapsulation, but in each case we refer to it here as the ``tunnel | |
741 | key.'' See <code>Tunnel Encapsulations</code>, below, for details. | |
742 | </dd> | |
743 | ||
744 | <dt>logical datapath field</dt> | |
745 | <dd> | |
746 | A field that denotes the logical datapath through which a packet is being | |
4103f6d2 BP |
747 | processed. |
748 | <!-- Keep the following in sync with MFF_LOG_DATAPATH in | |
667e2b0b | 749 | ovn/lib/logical-fields.h. --> |
4103f6d2 BP |
750 | OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls |
751 | ``metadata'' to store the logical datapath. (This field is passed across | |
752 | tunnels as part of the tunnel key.) | |
5868eb24 BP |
753 | </dd> |
754 | ||
755 | <dt>logical input port field</dt> | |
756 | <dd> | |
37910994 JP |
757 | <p> |
758 | A field that denotes the logical port from which the packet | |
759 | entered the logical datapath. | |
760 | <!-- Keep the following in sync with MFF_LOG_INPORT in | |
667e2b0b | 761 | ovn/lib/logical-fields.h. --> |
b221ff0d | 762 | OVN stores this in Open vSwitch extension register number 14. |
37910994 JP |
763 | </p> |
764 | ||
765 | <p> | |
766 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
767 | Although VXLAN tunnels do not explicitly carry a logical input port, | |
768 | OVN only uses VXLAN to communicate with gateways that from OVN's | |
769 | perspective consist of only a single logical port, so that OVN can set | |
770 | the logical input port field to this one on ingress to the OVN logical | |
771 | pipeline. | |
772 | </p> | |
5868eb24 BP |
773 | </dd> |
774 | ||
775 | <dt>logical output port field</dt> | |
776 | <dd> | |
37910994 JP |
777 | <p> |
778 | A field that denotes the logical port from which the packet will | |
779 | leave the logical datapath. This is initialized to 0 at the | |
780 | beginning of the logical ingress pipeline. | |
781 | <!-- Keep the following in sync with MFF_LOG_OUTPORT in | |
667e2b0b | 782 | ovn/lib/logical-fields.h. --> |
b221ff0d | 783 | OVN stores this in Open vSwitch extension register number 15. |
37910994 JP |
784 | </p> |
785 | ||
786 | <p> | |
787 | Geneve and STT tunnels pass this field as part of the tunnel key. | |
788 | VXLAN tunnels do not transmit the logical output port field. | |
475f0a2c DB |
789 | Since VXLAN tunnels do not carry a logical output port field in |
790 | the tunnel key, when a packet is received from VXLAN tunnel by | |
00c875d0 | 791 | an OVN hypervisor, the packet is resubmitted to table 8 to |
475f0a2c DB |
792 | determine the output port(s); when the packet reaches table 32, |
793 | these packets are resubmitted to table 33 for local delivery by | |
794 | checking a MLF_RCV_FROM_VXLAN flag, which is set when the packet | |
795 | arrives from a VXLAN tunnel. | |
37910994 | 796 | </p> |
5868eb24 BP |
797 | </dd> |
798 | ||
3bd4ae23 | 799 | <dt>conntrack zone field for logical ports</dt> |
78aab811 | 800 | <dd> |
3bd4ae23 GS |
801 | A field that denotes the connection tracking zone for logical ports. |
802 | The value only has local significance and is not meaningful between | |
803 | chassis. This is initialized to 0 at the beginning of the logical | |
cc5e28d8 JP |
804 | <!-- Keep the following in sync with MFF_LOG_CT_ZONE in |
805 | ovn/lib/logical-fields.h. --> | |
b221ff0d | 806 | ingress pipeline. OVN stores this in Open vSwitch extension register |
cc5e28d8 | 807 | number 13. |
3bd4ae23 GS |
808 | </dd> |
809 | ||
06a26dd2 | 810 | <dt>conntrack zone fields for routers</dt> |
3bd4ae23 | 811 | <dd> |
06a26dd2 MS |
812 | Fields that denote the connection tracking zones for routers. These |
813 | values only have local significance and are not meaningful between | |
b221ff0d | 814 | chassis. OVN stores the zone information for DNATting in Open vSwitch |
cc5e28d8 JP |
815 | <!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and |
816 | MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. --> | |
b221ff0d JP |
817 | extension register number 11 and zone information for SNATing in |
818 | Open vSwitch extension register number 12. | |
78aab811 JP |
819 | </dd> |
820 | ||
bf143492 JP |
821 | <dt>logical flow flags</dt> |
822 | <dd> | |
475f0a2c DB |
823 | The logical flags are intended to handle keeping context between |
824 | tables in order to decide which rules in subsequent tables are | |
825 | matched. These values only have local significance and are not | |
826 | meaningful between chassis. OVN stores the logical flags in | |
bf143492 JP |
827 | <!-- Keep the following in sync with MFF_LOG_FLAGS in |
828 | ovn/lib/logical-fields.h. --> | |
475f0a2c | 829 | Open vSwitch extension register number 10. |
bf143492 JP |
830 | </dd> |
831 | ||
5868eb24 BP |
832 | <dt>VLAN ID</dt> |
833 | <dd> | |
834 | The VLAN ID is used as an interface between OVN and containers nested | |
835 | inside a VM (see <code>Life Cycle of a container interface inside a | |
836 | VM</code>, above, for more information). | |
837 | </dd> | |
838 | </dl> | |
839 | ||
840 | <p> | |
841 | Initially, a VM or container on the ingress hypervisor sends a packet on a | |
842 | port attached to the OVN integration bridge. Then: | |
843 | </p> | |
844 | ||
845 | <ol> | |
b705f9ea JP |
846 | <li> |
847 | <p> | |
5868eb24 BP |
848 | OpenFlow table 0 performs physical-to-logical translation. It matches |
849 | the packet's ingress port. Its actions annotate the packet with | |
850 | logical metadata, by setting the logical datapath field to identify the | |
851 | logical datapath that the packet is traversing and the logical input | |
00c875d0 | 852 | port field to identify the ingress port. Then it resubmits to table 8 |
5868eb24 BP |
853 | to enter the logical ingress pipeline. |
854 | </p> | |
855 | ||
856 | <p> | |
857 | Packets that originate from a container nested within a VM are treated | |
858 | in a slightly different way. The originating container can be | |
859 | distinguished based on the VIF-specific VLAN ID, so the | |
860 | physical-to-logical translation flows additionally match on VLAN ID and | |
861 | the actions strip the VLAN header. Following this step, OVN treats | |
862 | packets from containers just like any other packets. | |
863 | </p> | |
864 | ||
865 | <p> | |
866 | Table 0 also processes packets that arrive from other chassis. It | |
867 | distinguishes them from other packets by ingress port, which is a | |
868 | tunnel. As with packets just entering the OVN pipeline, the actions | |
869 | annotate these packets with logical datapath and logical ingress port | |
870 | metadata. In addition, the actions set the logical output port field, | |
871 | which is available because in OVN tunneling occurs after the logical | |
872 | output port is known. These three pieces of information are obtained | |
873 | from the tunnel encapsulation metadata (see <code>Tunnel | |
874 | Encapsulations</code> for encoding details). Then the actions resubmit | |
875 | to table 33 to enter the logical egress pipeline. | |
b705f9ea JP |
876 | </p> |
877 | </li> | |
878 | ||
879 | <li> | |
880 | <p> | |
00c875d0 | 881 | OpenFlow tables 8 through 31 execute the logical ingress pipeline from |
5868eb24 BP |
882 | the <code>Logical_Flow</code> table in the OVN Southbound database. |
883 | These tables are expressed entirely in terms of logical concepts like | |
884 | logical ports and logical datapaths. A big part of | |
885 | <code>ovn-controller</code>'s job is to translate them into equivalent | |
886 | OpenFlow (in particular it translates the table numbers: | |
00c875d0 | 887 | <code>Logical_Flow</code> tables 0 through 23 become OpenFlow tables 8 |
0bac7164 | 888 | through 31). |
b705f9ea | 889 | </p> |
5868eb24 | 890 | |
c80eac1f BP |
891 | <p> |
892 | Each logical flow maps to one or more OpenFlow flows. An actual packet | |
893 | ordinarily matches only one of these, although in some cases it can | |
894 | match more than one of these flows (which is not a problem because all | |
895 | of them have the same actions). <code>ovn-controller</code> uses the | |
896 | first 32 bits of the logical flow's UUID as the cookie for its OpenFlow | |
897 | flow or flows. (This is not necessarily unique, since the first 32 | |
898 | bits of a logical flow's UUID is not necessarily unique.) | |
899 | </p> | |
900 | ||
901 | <p> | |
902 | Some logical flows can map to the Open vSwitch ``conjunctive match'' | |
96fee5e0 | 903 | extension (see <code>ovs-fields</code>(7)). Flows with a |
c80eac1f BP |
904 | <code>conjunction</code> action use an OpenFlow cookie of 0, because |
905 | they can correspond to multiple logical flows. The OpenFlow flow for a | |
906 | conjunctive match includes a match on <code>conj_id</code>. | |
907 | </p> | |
908 | ||
909 | <p> | |
910 | Some logical flows may not be represented in the OpenFlow tables on a | |
911 | given hypervisor, if they could not be used on that hypervisor. For | |
912 | example, if no VIF in a logical switch resides on a given hypervisor, | |
913 | and the logical switch is not otherwise reachable on that hypervisor | |
914 | (e.g. over a series of hops through logical switches and routers | |
915 | starting from a VIF on the hypervisor), then the logical flow may not | |
916 | be represented there. | |
917 | </p> | |
918 | ||
0bac7164 BP |
919 | <p> |
920 | Most OVN actions have fairly obvious implementations in OpenFlow (with | |
921 | OVS extensions), e.g. <code>next;</code> is implemented as | |
922 | <code>resubmit</code>, <code><var>field</var> = | |
923 | <var>constant</var>;</code> as <code>set_field</code>. A few are worth | |
924 | describing in more detail: | |
925 | </p> | |
926 | ||
927 | <dl> | |
928 | <dt><code>output:</code></dt> | |
929 | <dd> | |
930 | Implemented by resubmitting the packet to table 32. If the pipeline | |
931 | executes more than one <code>output</code> action, then each one is | |
932 | separately resubmitted to table 32. This can be used to send | |
933 | multiple copies of the packet to multiple ports. (If the packet was | |
934 | not modified between the <code>output</code> actions, and some of the | |
935 | copies are destined to the same hypervisor, then using a logical | |
936 | multicast output port would save bandwidth between hypervisors.) | |
937 | </dd> | |
938 | ||
939 | <dt><code>get_arp(<var>P</var>, <var>A</var>);</code></dt> | |
c34a87b6 | 940 | <dt><code>get_nd(<var>P</var>, <var>A</var>);</code></dt> |
0bac7164 BP |
941 | <dd> |
942 | <p> | |
943 | Implemented by storing arguments into OpenFlow fields, then | |
bf143492 | 944 | resubmitting to table 66, which <code>ovn-controller</code> |
0bac7164 BP |
945 | populates with flows generated from the <code>MAC_Binding</code> |
946 | table in the OVN Southbound database. If there is a match in table | |
bf143492 | 947 | 66, then its actions store the bound MAC in the Ethernet |
0bac7164 BP |
948 | destination address field. |
949 | </p> | |
950 | ||
951 | <p> | |
952 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
953 | the arguments, so that the OVN actions do not have to be aware of | |
954 | this temporary use.) | |
955 | </p> | |
956 | </dd> | |
957 | ||
958 | <dt><code>put_arp(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt> | |
c34a87b6 | 959 | <dt><code>put_nd(<var>P</var>, <var>A</var>, <var>E</var>);</code></dt> |
0bac7164 BP |
960 | <dd> |
961 | <p> | |
962 | Implemented by storing the arguments into OpenFlow fields, then | |
963 | outputting a packet to <code>ovn-controller</code>, which updates | |
964 | the <code>MAC_Binding</code> table. | |
965 | </p> | |
966 | ||
967 | <p> | |
968 | (The OpenFlow actions save and restore the OpenFlow fields used for | |
969 | the arguments, so that the OVN actions do not have to be aware of | |
970 | this temporary use.) | |
971 | </p> | |
972 | </dd> | |
973 | </dl> | |
b705f9ea JP |
974 | </li> |
975 | ||
976 | <li> | |
977 | <p> | |
5868eb24 BP |
978 | OpenFlow tables 32 through 47 implement the <code>output</code> action |
979 | in the logical ingress pipeline. Specifically, table 32 handles | |
980 | packets to remote hypervisors, table 33 handles packets to the local | |
bf143492 JP |
981 | hypervisor, and table 34 checks whether packets whose logical ingress |
982 | and egress port are the same should be discarded. | |
5868eb24 BP |
983 | </p> |
984 | ||
0b7da177 BP |
985 | <p> |
986 | Logical patch ports are a special case. Logical patch ports do not | |
987 | have a physical location and effectively reside on every hypervisor. | |
988 | Thus, flow table 33, for output to ports on the local hypervisor, | |
989 | naturally implements output to unicast logical patch ports too. | |
990 | However, applying the same logic to a logical patch port that is part | |
991 | of a logical multicast group yields packet duplication, because each | |
992 | hypervisor that contains a logical port in the multicast group will | |
993 | also output the packet to the logical patch port. Thus, multicast | |
994 | groups implement output to logical patch ports in table 32. | |
995 | </p> | |
996 | ||
5868eb24 BP |
997 | <p> |
998 | Each flow in table 32 matches on a logical output port for unicast or | |
999 | multicast logical ports that include a logical port on a remote | |
1000 | hypervisor. Each flow's actions implement sending a packet to the port | |
1001 | it matches. For unicast logical output ports on remote hypervisors, | |
1002 | the actions set the tunnel key to the correct value, then send the | |
1003 | packet on the tunnel port to the correct hypervisor. (When the remote | |
1004 | hypervisor receives the packet, table 0 there will recognize it as a | |
1005 | tunneled packet and pass it along to table 33.) For multicast logical | |
1006 | output ports, the actions send one copy of the packet to each remote | |
1007 | hypervisor, in the same way as for unicast destinations. If a | |
1008 | multicast group includes a logical port or ports on the local | |
1009 | hypervisor, then its actions also resubmit to table 33. Table 32 also | |
2a38ef45 | 1010 | includes: |
5868eb24 BP |
1011 | </p> |
1012 | ||
2a38ef45 DA |
1013 | <ul> |
1014 | <li> | |
1015 | A higher-priority rule to match packets received from VXLAN tunnels, | |
1016 | based on flag MLF_RCV_FROM_VXLAN, and resubmit these packets to table | |
1017 | 33 for local delivery. Packets received from VXLAN tunnels reach | |
1018 | here because of a lack of logical output port field in the tunnel key | |
00c875d0 | 1019 | and thus these packets needed to be submitted to table 8 to |
2a38ef45 DA |
1020 | determine the output port. |
1021 | </li> | |
1022 | <li> | |
1023 | A higher-priority rule to match packets received from ports of type | |
1024 | <code>localport</code>, based on the logical input port, and resubmit | |
1025 | these packets to table 33 for local delivery. Ports of type | |
1026 | <code>localport</code> exist on every hypervisor and by definition | |
1027 | their traffic should never go out through a tunnel. | |
1028 | </li> | |
74c2eacc MM |
1029 | <li> |
1030 | A higher-priority rule to match packets that have the MLF_LOCAL_ONLY | |
1031 | logical flow flag set, and whose destination is a multicast address. | |
1032 | This flag indicates that the packet should not be delivered to remote | |
1033 | hypervisors, even if the multicast destination includes ports on | |
1034 | remote hypervisors. This flag is used when | |
1035 | <code>ovn-controller</code> is the originator of the multicast packet. | |
1036 | Since each <code>ovn-controller</code> instance is originating these | |
1037 | packets, the packets only need to be delivered to local ports. | |
1038 | </li> | |
2a38ef45 DA |
1039 | <li> |
1040 | A fallback flow that resubmits to table 33 if there is no other | |
1041 | match. | |
1042 | </li> | |
1043 | </ul> | |
1044 | ||
5868eb24 BP |
1045 | <p> |
1046 | Flows in table 33 resemble those in table 32 but for logical ports that | |
0b7da177 | 1047 | reside locally rather than remotely. For unicast logical output ports |
5868eb24 BP |
1048 | on the local hypervisor, the actions just resubmit to table 34. For |
1049 | multicast output ports that include one or more logical ports on the | |
1050 | local hypervisor, for each such logical port <var>P</var>, the actions | |
1051 | change the logical output port to <var>P</var>, then resubmit to table | |
1052 | 34. | |
1053 | </p> | |
1054 | ||
6e6c3f91 HZ |
1055 | <p> |
1056 | A special case is that when a localnet port exists on the datapath, | |
1057 | remote port is connected by switching to the localnet port. In this | |
1058 | case, instead of adding a flow in table 32 to reach the remote port, a | |
1059 | flow is added in table 33 to switch the logical outport to the localnet | |
1060 | port, and resubmit to table 33 as if it were unicasted to a logical | |
1061 | port on the local hypervisor. | |
1062 | </p> | |
1063 | ||
5868eb24 BP |
1064 | <p> |
1065 | Table 34 matches and drops packets for which the logical input and | |
bf143492 | 1066 | output ports are the same and the MLF_ALLOW_LOOPBACK flag is not |
00c875d0 | 1067 | set. It resubmits other packets to table 40. |
b705f9ea JP |
1068 | </p> |
1069 | </li> | |
5868eb24 BP |
1070 | |
1071 | <li> | |
1072 | <p> | |
00c875d0 | 1073 | OpenFlow tables 40 through 63 execute the logical egress pipeline from |
5868eb24 BP |
1074 | the <code>Logical_Flow</code> table in the OVN Southbound database. |
1075 | The egress pipeline can perform a final stage of validation before | |
1076 | packet delivery. Eventually, it may execute an <code>output</code> | |
1077 | action, which <code>ovn-controller</code> implements by resubmitting to | |
1078 | table 64. A packet for which the pipeline never executes | |
1079 | <code>output</code> is effectively dropped (although it may have been | |
1080 | transmitted through a tunnel across a physical network). | |
1081 | </p> | |
1082 | ||
1083 | <p> | |
1084 | The egress pipeline cannot change the logical output port or cause | |
1085 | further tunneling. | |
1086 | </p> | |
1087 | </li> | |
1088 | ||
bf143492 JP |
1089 | <li> |
1090 | <p> | |
1091 | Table 64 bypasses OpenFlow loopback when MLF_ALLOW_LOOPBACK is set. | |
1092 | Logical loopback was handled in table 34, but OpenFlow by default also | |
1093 | prevents loopback to the OpenFlow ingress port. Thus, when | |
1094 | MLF_ALLOW_LOOPBACK is set, OpenFlow table 64 saves the OpenFlow ingress | |
1095 | port, sets it to zero, resubmits to table 65 for logical-to-physical | |
1096 | transformation, and then restores the OpenFlow ingress port, | |
1097 | effectively disabling OpenFlow loopback prevents. When | |
1098 | MLF_ALLOW_LOOPBACK is unset, table 64 flow simply resubmits to table | |
1099 | 65. | |
1100 | </p> | |
1101 | </li> | |
1102 | ||
5868eb24 BP |
1103 | <li> |
1104 | <p> | |
bf143492 | 1105 | OpenFlow table 65 performs logical-to-physical translation, the |
5868eb24 BP |
1106 | opposite of table 0. It matches the packet's logical egress port. Its |
1107 | actions output the packet to the port attached to the OVN integration | |
1108 | bridge that represents that logical port. If the logical egress port | |
1109 | is a container nested with a VM, then before sending the packet the | |
1110 | actions push on a VLAN header with an appropriate VLAN ID. | |
1111 | </p> | |
1112 | </li> | |
1113 | </ol> | |
1114 | ||
3a77e831 MS |
1115 | <h2>Logical Routers and Logical Patch Ports</h2> |
1116 | ||
1117 | <p> | |
1118 | Typically logical routers and logical patch ports do not have a | |
1119 | physical location and effectively reside on every hypervisor. This is | |
1120 | the case for logical patch ports between logical routers and logical | |
1121 | switches behind those logical routers, to which VMs (and VIFs) attach. | |
1122 | </p> | |
1123 | ||
1124 | <p> | |
1125 | Consider a packet sent from one virtual machine or container to another | |
1126 | VM or container that resides on a different subnet. The packet will | |
1127 | traverse tables 0 to 65 as described in the previous section | |
1128 | <code>Architectural Physical Life Cycle of a Packet</code>, using the | |
1129 | logical datapath representing the logical switch that the sender is | |
1130 | attached to. At table 32, the packet will use the fallback flow that | |
1131 | resubmits locally to table 33 on the same hypervisor. In this case, | |
1132 | all of the processing from table 0 to table 65 occurs on the hypervisor | |
1133 | where the sender resides. | |
1134 | </p> | |
1135 | ||
1136 | <p> | |
1137 | When the packet reaches table 65, the logical egress port is a logical | |
1138 | patch port. The implementation in table 65 differs depending on the OVS | |
1139 | version, although the observed behavior is meant to be the same: | |
1140 | </p> | |
1141 | ||
1142 | <ul> | |
1143 | <li> | |
1144 | In OVS versions 2.6 and earlier, table 65 outputs to an OVS patch | |
1145 | port that represents the logical patch port. The packet re-enters | |
1146 | the OpenFlow flow table from the OVS patch port's peer in table 0, | |
1147 | which identifies the logical datapath and logical input port based | |
1148 | on the OVS patch port's OpenFlow port number. | |
1149 | </li> | |
1150 | ||
1151 | <li> | |
1152 | In OVS versions 2.7 and later, the packet is cloned and resubmitted | |
00c875d0 MS |
1153 | directly to the first OpenFlow flow table in the ingress pipeline, |
1154 | setting the logical ingress port to the peer logical patch port, and | |
1155 | using the peer logical patch port's logical datapath (that | |
1156 | represents the logical router). | |
3a77e831 MS |
1157 | </li> |
1158 | </ul> | |
1159 | ||
1160 | <p> | |
1161 | The packet re-enters the ingress pipeline in order to traverse tables | |
00c875d0 | 1162 | 8 to 65 again, this time using the logical datapath representing the |
3a77e831 MS |
1163 | logical router. The processing continues as described in the previous |
1164 | section <code>Architectural Physical Life Cycle of a Packet</code>. | |
1165 | When the packet reachs table 65, the logical egress port will once | |
1166 | again be a logical patch port. In the same manner as described above, | |
1167 | this logical patch port will cause the packet to be resubmitted to | |
00c875d0 | 1168 | OpenFlow tables 8 to 65, this time using the logical datapath |
3a77e831 MS |
1169 | representing the logical switch that the destination VM or container |
1170 | is attached to. | |
1171 | </p> | |
1172 | ||
1173 | <p> | |
00c875d0 | 1174 | The packet traverses tables 8 to 65 a third and final time. If the |
3a77e831 MS |
1175 | destination VM or container resides on a remote hypervisor, then table |
1176 | 32 will send the packet on a tunnel port from the sender's hypervisor | |
1177 | to the remote hypervisor. Finally table 65 will output the packet | |
1178 | directly to the destination VM or container. | |
1179 | </p> | |
1180 | ||
1181 | <p> | |
41a15b71 MS |
1182 | The following sections describe two exceptions, where logical routers |
1183 | and/or logical patch ports are associated with a physical location. | |
3a77e831 MS |
1184 | </p> |
1185 | ||
1186 | <h3>Gateway Routers</h3> | |
1187 | ||
1188 | <p> | |
1189 | A <dfn>gateway router</dfn> is a logical router that is bound to a | |
1190 | physical location. This includes all of the logical patch ports of | |
1191 | the logical router, as well as all of the peer logical patch ports on | |
1192 | logical switches. In the OVN Southbound database, the | |
1193 | <code>Port_Binding</code> entries for these logical patch ports use | |
1194 | the type <code>l3gateway</code> rather than <code>patch</code>, in | |
1195 | order to distinguish that these logical patch ports are bound to a | |
1196 | chassis. | |
1197 | </p> | |
1198 | ||
1199 | <p> | |
1200 | When a hypervisor processes a packet on a logical datapath | |
1201 | representing a logical switch, and the logical egress port is a | |
1202 | <code>l3gateway</code> port representing connectivity to a gateway | |
1203 | router, the packet will match a flow in table 32 that sends the | |
1204 | packet on a tunnel port to the chassis where the gateway router | |
1205 | resides. This processing in table 32 is done in the same manner as | |
1206 | for VIFs. | |
1207 | </p> | |
1208 | ||
1209 | <p> | |
1210 | Gateway routers are typically used in between distributed logical | |
1211 | routers and physical networks. The distributed logical router and | |
1212 | the logical switches behind it, to which VMs and containers attach, | |
1213 | effectively reside on each hypervisor. The distributed router and | |
1214 | the gateway router are connected by another logical switch, sometimes | |
1215 | referred to as a <code>join</code> logical switch. On the other | |
1216 | side, the gateway router connects to another logical switch that has | |
1217 | a localnet port connecting to the physical network. | |
1218 | </p> | |
1219 | ||
1220 | <p> | |
1221 | When using gateway routers, DNAT and SNAT rules are associated with | |
1222 | the gateway router, which provides a central location that can handle | |
1223 | one-to-many SNAT (aka IP masquerading). | |
1224 | </p> | |
1225 | ||
41a15b71 MS |
1226 | <h3>Distributed Gateway Ports</h3> |
1227 | ||
1228 | <p> | |
1229 | <dfn>Distributed gateway ports</dfn> are logical router patch ports | |
1230 | that directly connect distributed logical routers to logical | |
1231 | switches with localnet ports. | |
1232 | </p> | |
1233 | ||
1234 | <p> | |
1235 | The primary design goal of distributed gateway ports is to allow as | |
1236 | much traffic as possible to be handled locally on the hypervisor | |
1237 | where a VM or container resides. Whenever possible, packets from | |
1238 | the VM or container to the outside world should be processed | |
1239 | completely on that VM's or container's hypervisor, eventually | |
1240 | traversing a localnet port instance on that hypervisor to the | |
1241 | physical network. Whenever possible, packets from the outside | |
1242 | world to a VM or container should be directed through the physical | |
1243 | network directly to the VM's or container's hypervisor, where the | |
1244 | packet will enter the integration bridge through a localnet port. | |
1245 | </p> | |
1246 | ||
1247 | <p> | |
1248 | In order to allow for the distributed processing of packets | |
1249 | described in the paragraph above, distributed gateway ports need to | |
1250 | be logical patch ports that effectively reside on every hypervisor, | |
1251 | rather than <code>l3gateway</code> ports that are bound to a | |
1252 | particular chassis. However, the flows associated with distributed | |
1253 | gateway ports often need to be associated with physical locations, | |
1254 | for the following reasons: | |
1255 | </p> | |
1256 | ||
1257 | <ul> | |
1258 | <li> | |
1259 | <p> | |
1260 | The physical network that the localnet port is attached to | |
1261 | typically uses L2 learning. Any Ethernet address used over the | |
1262 | distributed gateway port must be restricted to a single physical | |
1263 | location so that upstream L2 learning is not confused. Traffic | |
1264 | sent out the distributed gateway port towards the localnet port | |
1265 | with a specific Ethernet address must be sent out one specific | |
1266 | instance of the distributed gateway port on one specific | |
1267 | chassis. Traffic received from the localnet port (or from a VIF | |
1268 | on the same logical switch as the localnet port) with a specific | |
1269 | Ethernet address must be directed to the logical switch's patch | |
1270 | port instance on that specific chassis. | |
1271 | </p> | |
1272 | ||
1273 | <p> | |
1274 | Due to the implications of L2 learning, the Ethernet address and | |
1275 | IP address of the distributed gateway port need to be restricted | |
1276 | to a single physical location. For this reason, the user must | |
1277 | specify one chassis associated with the distributed gateway | |
1278 | port. Note that traffic traversing the distributed gateway port | |
1279 | using other Ethernet addresses and IP addresses (e.g. one-to-one | |
1280 | NAT) is not restricted to this chassis. | |
1281 | </p> | |
1282 | ||
1283 | <p> | |
1284 | Replies to ARP and ND requests must be restricted to a single | |
1285 | physical location, where the Ethernet address in the reply | |
1286 | resides. This includes ARP and ND replies for the IP address | |
1287 | of the distributed gateway port, which are restricted to the | |
1288 | chassis that the user associated with the distributed gateway | |
1289 | port. | |
1290 | </p> | |
1291 | </li> | |
1292 | ||
1293 | <li> | |
1294 | In order to support one-to-many SNAT (aka IP masquerading), where | |
1295 | multiple logical IP addresses spread across multiple chassis are | |
1296 | mapped to a single external IP address, it will be necessary to | |
1297 | handle some of the logical router processing on a specific chassis | |
1298 | in a centralized manner. Since the SNAT external IP address is | |
1299 | typically the distributed gateway port IP address, and for | |
1300 | simplicity, the same chassis associated with the distributed | |
1301 | gateway port is used. | |
1302 | </li> | |
1303 | </ul> | |
1304 | ||
1305 | <p> | |
1306 | The details of flow restrictions to specific chassis are described | |
1307 | in the <code>ovn-northd</code> documentation. | |
1308 | </p> | |
1309 | ||
1310 | <p> | |
1311 | While most of the physical location dependent aspects of distributed | |
1312 | gateway ports can be handled by restricting some flows to specific | |
1313 | chassis, one additional mechanism is required. When a packet | |
1314 | leaves the ingress pipeline and the logical egress port is the | |
1315 | distributed gateway port, one of two different sets of actions is | |
1316 | required at table 32: | |
1317 | </p> | |
1318 | ||
1319 | <ul> | |
1320 | <li> | |
1321 | If the packet can be handled locally on the sender's hypervisor | |
1322 | (e.g. one-to-one NAT traffic), then the packet should just be | |
1323 | resubmitted locally to table 33, in the normal manner for | |
1324 | distributed logical patch ports. | |
1325 | </li> | |
1326 | ||
1327 | <li> | |
1328 | However, if the packet needs to be handled on the chassis | |
1329 | associated with the distributed gateway port (e.g. one-to-many | |
1330 | SNAT traffic or non-NAT traffic), then table 32 must send the | |
1331 | packet on a tunnel port to that chassis. | |
1332 | </li> | |
1333 | </ul> | |
1334 | ||
1335 | <p> | |
1336 | In order to trigger the second set of actions, the | |
1337 | <code>chassisredirect</code> type of southbound | |
1338 | <code>Port_Binding</code> has been added. Setting the logical | |
1339 | egress port to the type <code>chassisredirect</code> logical port is | |
1340 | simply a way to indicate that although the packet is destined for | |
1341 | the distributed gateway port, it needs to be redirected to a | |
1342 | different chassis. At table 32, packets with this logical egress | |
1343 | port are sent to a specific chassis, in the same way that table 32 | |
1344 | directs packets whose logical egress port is a VIF or a type | |
1345 | <code>l3gateway</code> port to different chassis. Once the packet | |
1346 | arrives at that chassis, table 33 resets the logical egress port to | |
1347 | the value representing the distributed gateway port. For each | |
1348 | distributed gateway port, there is one type | |
1349 | <code>chassisredirect</code> port, in addition to the distributed | |
1350 | logical patch port representing the distributed gateway port. | |
1351 | </p> | |
1352 | ||
52425efb RB |
1353 | <h3>High Availability for Distributed Gateway Ports</h3> |
1354 | ||
1355 | <p> | |
1356 | OVN allows you to specify a prioritized list of chassis for a distributed | |
1357 | gateway port. This is done by associating multiple | |
1358 | <code>Gateway_Chassis</code> rows with a <code>Logical_Router_Port</code> | |
1359 | in the <code>OVN_Northbound</code> database. | |
1360 | </p> | |
1361 | ||
1362 | <p> | |
1363 | When multiple chassis have been specified for a gateway, all chassis that | |
1364 | may send packets to that gateway will enable BFD on tunnels to all | |
1365 | configured gateway chassis. The current master chassis for the gateway | |
1366 | is the highest priority gateway chassis that is currently viewed as | |
1367 | active based on BFD status. | |
1368 | </p> | |
1369 | ||
1370 | <p> | |
1371 | For more information on L3 gateway high availability, please refer to | |
1372 | http://docs.openvswitch.org/en/latest/topics/high-availability. | |
1373 | </p> | |
1374 | ||
85706c34 NS |
1375 | <h2>Multiple localnet logical switches connected to a Logical Router</h2> |
1376 | ||
1377 | <p> | |
1378 | It is possible to have multiple logical switches each with a localnet port | |
1379 | (representing physical networks) connected to a logical router, in which | |
1380 | one localnet logical switch may provide the external connectivity via a | |
1381 | distributed gateway port and rest of the localnet logical switches use | |
1382 | VLAN tagging in the physical network. It is expected that | |
1383 | <code>ovn-bridge-mappings</code> is configured appropriately on the | |
1384 | chassis for all these localnet networks. | |
1385 | </p> | |
1386 | ||
1387 | <h3>East West routing</h3> | |
1388 | <p> | |
1389 | East-West routing between these localnet VLAN tagged logical switches | |
1390 | work almost the same way as normal logical switches. When the VM sends | |
1391 | such a packet, then: | |
1392 | </p> | |
1393 | <ol> | |
1394 | <li> | |
1395 | It first enters the ingress pipeline, and then egress pipeline of the | |
1396 | source localnet logical switch datapath. It then enters the ingress | |
1397 | pipeline of the logical router datapath via the logical router port in | |
1398 | the source chassis. | |
1399 | </li> | |
1400 | ||
1401 | <li> | |
1402 | Routing decision is taken. | |
1403 | </li> | |
1404 | ||
1405 | <li> | |
1406 | From the router datapath, packet enters the ingress pipeline and then | |
1407 | egress pipeline of the destination localnet logical switch datapath | |
1408 | and goes out of the integration bridge to the provider bridge ( | |
1409 | belonging to the destination logical switch) via the localnet port. | |
1410 | </li> | |
1411 | ||
1412 | <li> | |
1413 | The destination chassis receives the packet via the localnet port and | |
1414 | sends it to the integration bridge. The packet enters the | |
1415 | ingress pipeline and then egress pipeline of the destination localnet | |
1416 | logical switch and finally gets delivered to the destination VM port. | |
1417 | </li> | |
1418 | </ol> | |
1419 | ||
1420 | <h3>External traffic</h3> | |
1421 | ||
1422 | <p> | |
1423 | The following happens when a VM sends an external traffic (which requires | |
1424 | NATting) and the chassis hosting the VM doesn't have a distributed gateway | |
1425 | port. | |
1426 | </p> | |
1427 | ||
1428 | <ol> | |
1429 | <li> | |
1430 | The packet first enters the ingress pipeline, and then egress pipeline of | |
1431 | the source localnet logical switch datapath. It then enters the ingress | |
1432 | pipeline of the logical router datapath via the logical router port in | |
1433 | the source chassis. | |
1434 | </li> | |
1435 | ||
1436 | <li> | |
1437 | Routing decision is taken. Since the gateway router or the distributed | |
1438 | gateway port doesn't reside in the source chassis, the traffic is | |
1439 | redirected to the gateway chassis via the tunnel port. | |
1440 | </li> | |
1441 | ||
1442 | <li> | |
1443 | The gateway chassis receives the packet via the tunnel port and the | |
1444 | packet enters the egress pipeline of the logical router datapath. NAT | |
1445 | rules are applied here. The packet then enters the ingress pipeline and | |
1446 | then egress pipeline of the localnet logical switch datapath which | |
1447 | provides external connectivity and finally goes out via the localnet | |
1448 | port of the logical switch which provides external connectivity. | |
1449 | </li> | |
1450 | </ol> | |
1451 | ||
1452 | <p> | |
1453 | Although this works, the VM traffic is tunnelled when sent from the compute | |
1454 | chassis to the gateway chassis. In order for it to work properly, the MTU | |
1455 | of the localnet logical switches must be lowered to account for the tunnel | |
1456 | encapsulation. | |
1457 | </p> | |
1458 | ||
1459 | <h2> | |
1460 | Centralized routing for localnet VLAN tagged logical switches connected | |
1461 | to a Logical Router | |
1462 | </h2> | |
1463 | ||
1464 | <p> | |
1465 | To overcome the tunnel encapsulation problem described in the previous | |
1466 | section, <code>OVN</code> supports the option of enabling centralized | |
1467 | routing for localnet VLAN tagged logical switches. CMS can configure the | |
1468 | option <ref column="options:reside-on-redirect-chassis" | |
1469 | table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for each | |
1470 | <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the | |
1471 | localnet VLAN tagged logical switches. This causes the gateway | |
1472 | chassis (hosting the distributed gateway port) to handle all the | |
1473 | routing for these networks, making it centralized. It will reply to | |
1474 | the ARP requests for the logical router port IPs. | |
1475 | </p> | |
1476 | ||
1477 | <p> | |
1478 | If the logical router doesn't have a distributed gateway port connecting | |
1479 | to the localnet logical switch which provides external connectivity, | |
1480 | then this option is ignored by <code>OVN</code>. | |
1481 | </p> | |
1482 | ||
1483 | <p> | |
1484 | The following happens when a VM sends an east-west traffic which needs to | |
1485 | be routed: | |
1486 | </p> | |
1487 | ||
1488 | <ol> | |
1489 | <li> | |
1490 | The packet first enters the ingress pipeline, and then egress pipeline of | |
1491 | the source localnet logical switch datapath and is sent out via the | |
1492 | localnet port of the source localnet logical switch (instead of sending | |
1493 | it to router pipeline). | |
1494 | </li> | |
1495 | ||
1496 | <li> | |
1497 | The gateway chassis receives the packet via the localnet port of the | |
1498 | source localnet logical switch and sends it to the integration bridge. | |
1499 | The packet then enters the ingress pipeline, and then egress pipeline of | |
1500 | the source localnet logical switch datapath and enters the ingress | |
1501 | pipeline of the logical router datapath. | |
1502 | </li> | |
1503 | ||
1504 | <li> | |
1505 | Routing decision is taken. | |
1506 | </li> | |
1507 | ||
1508 | <li> | |
1509 | From the router datapath, packet enters the ingress pipeline and then | |
1510 | egress pipeline of the destination localnet logical switch datapath. | |
1511 | It then goes out of the integration bridge to the provider bridge ( | |
1512 | belonging to the destination logical switch) via the localnet port. | |
1513 | </li> | |
1514 | ||
1515 | <li> | |
1516 | The destination chassis receives the packet via the localnet port and | |
1517 | sends it to the integration bridge. The packet enters the | |
1518 | ingress pipeline and then egress pipeline of the destination localnet | |
1519 | logical switch and finally delivered to the destination VM port. | |
1520 | </li> | |
1521 | </ol> | |
1522 | ||
1523 | <p> | |
1524 | The following happens when a VM sends an external traffic which requires | |
1525 | NATting: | |
1526 | </p> | |
1527 | ||
1528 | <ol> | |
1529 | <li> | |
1530 | The packet first enters the ingress pipeline, and then egress pipeline of | |
1531 | the source localnet logical switch datapath and is sent out via the | |
1532 | localnet port of the source localnet logical switch (instead of sending | |
1533 | it to router pipeline). | |
1534 | </li> | |
1535 | ||
1536 | <li> | |
1537 | The gateway chassis receives the packet via the localnet port of the | |
1538 | source localnet logical switch and sends it to the integration bridge. | |
1539 | The packet then enters the ingress pipeline, and then egress pipeline of | |
1540 | the source localnet logical switch datapath and enters the ingress | |
1541 | pipeline of the logical router datapath. | |
1542 | </li> | |
1543 | ||
1544 | <li> | |
1545 | Routing decision is taken and NAT rules are applied. | |
1546 | </li> | |
1547 | ||
1548 | <li> | |
1549 | From the router datapath, packet enters the ingress pipeline and then | |
1550 | egress pipeline of the localnet logical switch datapath which provides | |
1551 | external connectivity. It then goes out of the integration bridge to the | |
1552 | provider bridge (belonging to the logical switch which provides external | |
1553 | connectivity) via the localnet port. | |
1554 | </li> | |
1555 | </ol> | |
1556 | ||
1557 | <p> | |
1558 | The following happens for the reverse external traffic. | |
1559 | </p> | |
1560 | ||
1561 | <ol> | |
1562 | <li> | |
1563 | The gateway chassis receives the packet from the localnet port of | |
1564 | the logical switch which provides external connectivity. The packet then | |
1565 | enters the ingress pipeline and then egress pipeline of the localnet | |
1566 | logical switch (which provides external connectivity). The packet then | |
1567 | enters the ingress pipeline of the logical router datapath. | |
1568 | </li> | |
1569 | ||
1570 | <li> | |
1571 | The ingress pipeline of the logical router datapath applies the unNATting | |
1572 | rules. The packet then enters the ingress pipeline and then egress | |
1573 | pipeline of the source localnet logical switch. Since the source VM | |
1574 | doesn't reside in the gateway chassis, the packet is sent out via the | |
1575 | localnet port of the source logical switch. | |
1576 | </li> | |
1577 | ||
1578 | <li> | |
1579 | The source chassis receives the packet via the localnet port and | |
1580 | sends it to the integration bridge. The packet enters the | |
1581 | ingress pipeline and then egress pipeline of the source localnet | |
1582 | logical switch and finally gets delivered to the source VM port. | |
1583 | </li> | |
1584 | </ol> | |
1585 | ||
88058f19 AW |
1586 | <h2>Life Cycle of a VTEP gateway</h2> |
1587 | ||
1588 | <p> | |
1589 | A gateway is a chassis that forwards traffic between the OVN-managed | |
1590 | part of a logical network and a physical VLAN, extending a | |
1591 | tunnel-based logical network into a physical network. | |
1592 | </p> | |
1593 | ||
1594 | <p> | |
1595 | The steps below refer often to details of the OVN and VTEP database | |
1596 | schemas. Please see <code>ovn-sb</code>(5), <code>ovn-nb</code>(5) | |
1597 | and <code>vtep</code>(5), respectively, for the full story on these | |
1598 | databases. | |
1599 | </p> | |
1600 | ||
1601 | <ol> | |
1602 | <li> | |
1603 | A VTEP gateway's life cycle begins with the administrator registering | |
1604 | the VTEP gateway as a <code>Physical_Switch</code> table entry in the | |
1605 | <code>VTEP</code> database. The <code>ovn-controller-vtep</code> | |
1606 | connected to this VTEP database, will recognize the new VTEP gateway | |
1607 | and create a new <code>Chassis</code> table entry for it in the | |
1608 | <code>OVN_Southbound</code> database. | |
1609 | </li> | |
1610 | ||
1611 | <li> | |
1612 | The administrator can then create a new <code>Logical_Switch</code> | |
1613 | table entry, and bind a particular vlan on a VTEP gateway's port to | |
1614 | any VTEP logical switch. Once a VTEP logical switch is bound to | |
1615 | a VTEP gateway, the <code>ovn-controller-vtep</code> will detect | |
1616 | it and add its name to the <var>vtep_logical_switches</var> | |
1617 | column of the <code>Chassis</code> table in the <code> | |
1618 | OVN_Southbound</code> database. Note, the <var>tunnel_key</var> | |
1619 | column of VTEP logical switch is not filled at creation. The | |
1620 | <code>ovn-controller-vtep</code> will set the column when the | |
1621 | correponding vtep logical switch is bound to an OVN logical network. | |
1622 | </li> | |
1623 | ||
1624 | <li> | |
1625 | Now, the administrator can use the CMS to add a VTEP logical switch | |
1626 | to the OVN logical network. To do that, the CMS must first create a | |
80f408f4 | 1627 | new <code>Logical_Switch_Port</code> table entry in the <code> |
88058f19 AW |
1628 | OVN_Northbound</code> database. Then, the <var>type</var> column |
1629 | of this entry must be set to "vtep". Next, the <var> | |
1630 | vtep-logical-switch</var> and <var>vtep-physical-switch</var> keys | |
1631 | in the <var>options</var> column must also be specified, since | |
1632 | multiple VTEP gateways can attach to the same VTEP logical switch. | |
1633 | </li> | |
1634 | ||
1635 | <li> | |
1636 | The newly created logical port in the <code>OVN_Northbound</code> | |
1637 | database and its configuration will be passed down to the <code> | |
1638 | OVN_Southbound</code> database as a new <code>Port_Binding</code> | |
1639 | table entry. The <code>ovn-controller-vtep</code> will recognize the | |
1640 | change and bind the logical port to the corresponding VTEP gateway | |
1641 | chassis. Configuration of binding the same VTEP logical switch to | |
1642 | a different OVN logical networks is not allowed and a warning will be | |
1643 | generated in the log. | |
1644 | </li> | |
1645 | ||
1646 | <li> | |
1647 | Beside binding to the VTEP gateway chassis, the <code> | |
1648 | ovn-controller-vtep</code> will update the <var>tunnel_key</var> | |
1649 | column of the VTEP logical switch to the corresponding <code> | |
1650 | Datapath_Binding</code> table entry's <var>tunnel_key</var> for the | |
1651 | bound OVN logical network. | |
1652 | </li> | |
1653 | ||
1654 | <li> | |
1655 | Next, the <code>ovn-controller-vtep</code> will keep reacting to the | |
1656 | configuration change in the <code>Port_Binding</code> in the | |
1657 | <code>OVN_Northbound</code> database, and updating the | |
1658 | <code>Ucast_Macs_Remote</code> table in the <code>VTEP</code> database. | |
1659 | This allows the VTEP gateway to understand where to forward the unicast | |
1660 | traffic coming from the extended external network. | |
1661 | </li> | |
1662 | ||
1663 | <li> | |
1664 | Eventually, the VTEP gateway's life cycle ends when the administrator | |
1665 | unregisters the VTEP gateway from the <code>VTEP</code> database. | |
1666 | The <code>ovn-controller-vtep</code> will recognize the event and | |
1667 | remove all related configurations (<code>Chassis</code> table entry | |
1668 | and port bindings) in the <code>OVN_Southbound</code> database. | |
1669 | </li> | |
1670 | ||
1671 | <li> | |
1672 | When the <code>ovn-controller-vtep</code> is terminated, all related | |
1673 | configurations in the <code>OVN_Southbound</code> database and | |
1674 | the <code>VTEP</code> database will be cleaned, including | |
1675 | <code>Chassis</code> table entries for all registered VTEP gateways | |
1676 | and their port bindings, and all <code>Ucast_Macs_Remote</code> table | |
1677 | entries and the <code>Logical_Switch</code> tunnel keys. | |
1678 | </li> | |
1679 | </ol> | |
1680 | ||
75ddb5f4 LR |
1681 | <h1>Security</h1> |
1682 | ||
1683 | <h2>Role-Based Access Controls for the Soutbound DB</h2> | |
1684 | <p> | |
1685 | In order to provide additional security against the possibility of an OVN | |
1686 | chassis becoming compromised in such a way as to allow rogue software to | |
1687 | make arbitrary modifications to the southbound database state and thus | |
1688 | disrupt the OVN network, role-based access controls (see | |
1689 | <code>ovsdb-server(1)</code> for additional details) are provided for the | |
1690 | southbound database. | |
1691 | </p> | |
1692 | ||
1693 | <p> | |
1694 | The implementation of role-based access controls (RBAC) requires the | |
1695 | addition of two tables to an OVSDB schema: the <code>RBAC_Role</code> | |
1696 | table, which is indexed by role name and maps the the names of the various | |
1697 | tables that may be modifiable for a given role to individual rows in a | |
1698 | permissions table containing detailed permission information for that role, | |
1699 | and the permission table itself which consists of rows containing the | |
1700 | following information: | |
1701 | </p> | |
1702 | <dl> | |
1703 | <dt><code>Table Name</code></dt> | |
1704 | <dd> | |
1705 | The name of the associated table. This column exists primarily as an | |
1706 | aid for humans reading the contents of this table. | |
1707 | </dd> | |
1708 | ||
1709 | <dt><code>Auth Criteria</code></dt> | |
1710 | <dd> | |
1711 | A set of strings containing the names of columns (or column:key pairs | |
1712 | for columns containing string:string maps). The contents of at least | |
1713 | one of the columns or column:key values in a row to be modified, | |
1714 | inserted, or deleted must be equal to the ID of the client attempting | |
1715 | to act on the row in order for the authorization check to pass. If the | |
1716 | authorization criteria is empty, authorization checking is disabled and | |
1717 | all clients for the role will be treated as authorized. | |
1718 | </dd> | |
1719 | ||
1720 | <dt><code>Insert/Delete</code></dt> | |
1721 | <dd> | |
1722 | Row insertion/deletion permission; boolean value indicating whether | |
1723 | insertion and deletion of rows is allowed for the associated table. | |
1724 | If true, insertion and deletion of rows is allowed for authorized | |
1725 | clients. | |
1726 | </dd> | |
1727 | ||
1728 | <dt><code>Updatable Columns</code></dt> | |
1729 | <dd> | |
1730 | A set of strings containing the names of columns or column:key pairs | |
1731 | that may be updated or mutated by authorized clients. Modifications to | |
1732 | columns within a row are only permitted when the authorization check | |
1733 | for the client passes and all columns to be modified are included in | |
1734 | this set of modifiable columns. | |
1735 | </dd> | |
1736 | </dl> | |
1737 | ||
1738 | <p> | |
1739 | RBAC configuration for the OVN southbound database is maintained by | |
1740 | ovn-northd. With RBAC enabled, modifications are only permitted for the | |
1741 | <code>Chassis</code>, <code>Encap</code>, <code>Port_Binding</code>, and | |
1742 | <code>MAC_Binding</code> tables, and are resstricted as follows: | |
1743 | </p> | |
1744 | <dl> | |
1745 | <dt><code>Chassis</code></dt> | |
1746 | <dd> | |
1747 | <p> | |
1748 | <code>Authorization</code>: client ID must match the chassis name. | |
1749 | </p> | |
1750 | <p> | |
1751 | <code>Insert/Delete</code>: authorized row insertion and deletion | |
1752 | are permitted. | |
1753 | </p> | |
1754 | <p> | |
1755 | <code>Update</code>: The columns <code>nb_cfg</code>, | |
1756 | <code>external_ids</code>, <code>encaps</code>, and | |
1757 | <code>vtep_logical_switches</code> may be modified when authorized. | |
1758 | </p> | |
1759 | </dd> | |
1760 | ||
1761 | <dt><code>Encap</code></dt> | |
1762 | <dd> | |
1763 | <p> | |
5dbf6b17 | 1764 | <code>Authorization</code>: client ID must match the chassis name. |
75ddb5f4 LR |
1765 | </p> |
1766 | <p> | |
1767 | <code>Insert/Delete</code>: row insertion and row deletion | |
1768 | are permitted. | |
1769 | </p> | |
1770 | <p> | |
1771 | <code>Update</code>: The columns <code>type</code>, | |
1772 | <code>options</code>, and <code>ip</code> can be modified. | |
1773 | </p> | |
1774 | </dd> | |
1775 | ||
1776 | <dt><code>Port_Binding</code></dt> | |
1777 | <dd> | |
1778 | <p> | |
1779 | <code>Authorization</code>: disabled (all clients are considered | |
1780 | authorized. A future enhancement may add columns (or keys to | |
1781 | <code>external_ids</code>) in order to control which chassis are | |
1782 | allowed to bind each port. | |
1783 | </p> | |
1784 | <p> | |
1785 | <code>Insert/Delete</code>: row insertion/deletion are not permitted | |
1786 | (ovn-northd maintains rows in this table. | |
1787 | </p> | |
1788 | <p> | |
1789 | <code>Update</code>: Only modifications to the <code>chassis</code> | |
1790 | column are permitted. | |
1791 | </p> | |
1792 | </dd> | |
1793 | ||
1794 | <dt><code>MAC_Binding</code></dt> | |
1795 | <dd> | |
1796 | <p> | |
1797 | <code>Authorization</code>: disabled (all clients are considered | |
1798 | to be authorized). | |
1799 | </p> | |
1800 | <p> | |
1801 | <code>Insert/Delete</code>: row insertion/deletion are permitted. | |
1802 | </p> | |
1803 | <p> | |
1804 | <code>Update</code>: The columns <code>logical_port</code>, | |
1805 | <code>ip</code>, <code>mac</code>, and <code>datapath</code> may be | |
1806 | modified by ovn-controller. | |
1807 | </p> | |
1808 | </dd> | |
1809 | </dl> | |
1810 | ||
1811 | <p> | |
1812 | Enabling RBAC for ovn-controller connections to the southbound database | |
1813 | requires the following steps: | |
1814 | </p> | |
1815 | ||
1816 | <ol> | |
1817 | <li> | |
1818 | Creating SSL certificates for each chassis with the certificate CN field | |
1819 | set to the chassis name (e.g. for a chassis with | |
1820 | <code>external-ids:system-id=chassis-1</code>, via the command | |
48745e75 | 1821 | "<code>ovs-pki -u req+sign chassis-1 switch</code>"). |
75ddb5f4 LR |
1822 | </li> |
1823 | <li> | |
1824 | Configuring each ovn-controller to use SSL when connecting to the | |
1825 | southbound database (e.g. via "<code>ovs-vsctl set open . | |
1826 | external-ids:ovn-remote=ssl:x.x.x.x:6642</code>"). | |
1827 | </li> | |
1828 | <li> | |
1829 | Configuring a southbound database SSL remote with "ovn-controller" role | |
1830 | (e.g. via "<code>ovn-sbctl set-connection role=ovn-controller | |
1831 | pssl:6642</code>"). | |
1832 | </li> | |
1833 | </ol> | |
1834 | ||
b1cc0dba QX |
1835 | <h2>Encrypt Tunnel Traffic with IPsec</h2> |
1836 | <p> | |
1837 | OVN tunnel traffic goes through physical routers and switches. These | |
1838 | physical devices could be untrusted (devices in public network) or might be | |
1839 | compromised. Enabling encryption to the tunnel traffic can prevent the | |
1840 | traffic data from being monitored and manipulated. | |
1841 | </p> | |
1842 | <p> | |
1843 | The tunnel traffic is encrypted with IPsec. The CMS sets the | |
1844 | <code>ipsec</code> column in the northbound <code>NB_Global</code> table to | |
1845 | enable or disable IPsec encrytion. If <code>ipsec</code> is true, all OVN | |
1846 | tunnels will be encrypted. If <code>ipsec</code> is false, no OVN tunnels | |
1847 | will be encrypted. | |
1848 | </p> | |
1849 | <p> | |
1850 | When CMS updates the <code>ipsec</code> column in the northbound | |
1851 | <code>NB_Global</code> table, <code>ovn-northd</code> copies the value to | |
1852 | the <code>ipsec</code> column in the southbound <code>SB_Global</code> | |
1853 | table. <code>ovn-controller</code> in each chassis monitors the southbound | |
1854 | database and sets the options of the OVS tunnel interface accordingly. OVS | |
1855 | tunnel interface options are monitored by the | |
1856 | <code>ovs-monitor-ipsec</code> daemon which configures IKE daemon to set up | |
1857 | IPsec connections. | |
1858 | </p> | |
1859 | <p> | |
1860 | Chassis authenticates each other by using certificate. The authentication | |
1861 | succeeds if the other end in tunnel presents a certificate signed by a | |
1862 | trusted CA and the common name (CN) matches the expected chassis name. The | |
1863 | SSL certificates used in role-based access controls (RBAC) can be used in | |
1864 | IPsec. Or use <code>ovs-pki</code> to create different certificates. The | |
1865 | certificate is required to be x.509 version 3, and with CN field and | |
1866 | subjectAltName field being set to the chassis name. | |
1867 | </p> | |
1868 | <p> | |
1869 | The CA certificate, chassis certificate and private key are required to be | |
1870 | installed in each chassis before enabling IPsec. Please see | |
1871 | <code>ovs-vswitchd.conf.db</code>(5) for setting up CA based IPsec | |
1872 | authentication. | |
1873 | </p> | |
5868eb24 BP |
1874 | <h1>Design Decisions</h1> |
1875 | ||
1876 | <h2>Tunnel Encapsulations</h2> | |
1877 | ||
1878 | <p> | |
1879 | OVN annotates logical network packets that it sends from one hypervisor to | |
1880 | another with the following three pieces of metadata, which are encoded in | |
1881 | an encapsulation-specific fashion: | |
1882 | </p> | |
1883 | ||
1884 | <ul> | |
1885 | <li> | |
1886 | 24-bit logical datapath identifier, from the <code>tunnel_key</code> | |
1887 | column in the OVN Southbound <code>Datapath_Binding</code> table. | |
1888 | </li> | |
1889 | ||
1890 | <li> | |
1891 | 15-bit logical ingress port identifier. ID 0 is reserved for internal | |
1892 | use within OVN. IDs 1 through 32767, inclusive, may be assigned to | |
1893 | logical ports (see the <code>tunnel_key</code> column in the OVN | |
1894 | Southbound <code>Port_Binding</code> table). | |
1895 | </li> | |
1896 | ||
1897 | <li> | |
1898 | 16-bit logical egress port identifier. IDs 0 through 32767 have the same | |
1899 | meaning as for logical ingress ports. IDs 32768 through 65535, | |
1900 | inclusive, may be assigned to logical multicast groups (see the | |
1901 | <code>tunnel_key</code> column in the OVN Southbound | |
1902 | <code>Multicast_Group</code> table). | |
1903 | </li> | |
b705f9ea JP |
1904 | </ul> |
1905 | ||
1906 | <p> | |
5868eb24 BP |
1907 | For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT |
1908 | encapsulations, for the following reasons: | |
b705f9ea JP |
1909 | </p> |
1910 | ||
5868eb24 BP |
1911 | <ul> |
1912 | <li> | |
1913 | Only STT and Geneve support the large amounts of metadata (over 32 bits | |
1914 | per packet) that OVN uses (as described above). | |
1915 | </li> | |
1916 | ||
1917 | <li> | |
1918 | STT and Geneve use randomized UDP or TCP source ports that allows | |
1919 | efficient distribution among multiple paths in environments that use ECMP | |
1920 | in their underlay. | |
1921 | </li> | |
1922 | ||
1923 | <li> | |
1924 | NICs are available to offload STT and Geneve encapsulation and | |
1925 | decapsulation. | |
1926 | </li> | |
1927 | </ul> | |
1928 | ||
1929 | <p> | |
1930 | Due to its flexibility, the preferred encapsulation between hypervisors is | |
1931 | Geneve. For Geneve encapsulation, OVN transmits the logical datapath | |
1932 | identifier in the Geneve VNI. | |
1933 | ||
1934 | <!-- Keep the following in sync with ovn/controller/physical.h. --> | |
1935 | OVN transmits the logical ingress and logical egress ports in a TLV with | |
617609b8 | 1936 | class 0x0102, type 0x80, and a 32-bit value encoded as follows, from MSB to |
5868eb24 BP |
1937 | LSB: |
1938 | </p> | |
1939 | ||
1940 | <diagram> | |
1941 | <header name=""> | |
1942 | <bits name="rsv" above="1" below="0" width=".25"/> | |
1943 | <bits name="ingress port" above="15" width=".75"/> | |
1944 | <bits name="egress port" above="16" width=".75"/> | |
1945 | </header> | |
1946 | </diagram> | |
1947 | ||
1948 | <p> | |
1949 | Environments whose NICs lack Geneve offload may prefer STT encapsulation | |
1950 | for performance reasons. For STT encapsulation, OVN encodes all three | |
1951 | pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB | |
1952 | to LSB: | |
1953 | </p> | |
1954 | ||
1955 | <diagram> | |
1956 | <header name=""> | |
1957 | <bits name="reserved" above="9" below="0" width=".5"/> | |
1958 | <bits name="ingress port" above="15" width=".75"/> | |
1959 | <bits name="egress port" above="16" width=".75"/> | |
1960 | <bits name="datapath" above="24" width="1.25"/> | |
1961 | </header> | |
1962 | </diagram> | |
1963 | ||
b705f9ea | 1964 | <p> |
5868eb24 BP |
1965 | For connecting to gateways, in addition to Geneve and STT, OVN supports |
1966 | VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches. | |
1967 | Currently, gateways have a feature set that matches the capabilities as | |
1968 | defined by the VTEP schema, so fewer bits of metadata are necessary. In | |
1969 | the future, gateways that do not support encapsulations with large amounts | |
1970 | of metadata may continue to have a reduced feature set. | |
b705f9ea | 1971 | </p> |
fe36184b | 1972 | </manpage> |