]> git.proxmox.com Git - ovs.git/blame - ovn/ovn-sb.xml
ovn: Extend logical "next" action to jump to arbitrary flow tables.
[ovs.git] / ovn / ovn-sb.xml
CommitLineData
fe36184b 1<?xml version="1.0" encoding="utf-8"?>
ec78987f 2<database name="ovn-sb" title="OVN Southbound Database">
fe36184b
BP
3 <p>
4 This database holds logical and physical configuration and state for the
5 Open Virtual Network (OVN) system to support virtual network abstraction.
6 For an introduction to OVN, please see <code>ovn-architecture</code>(7).
7 </p>
8
9 <p>
ec78987f
JP
10 The OVN Southbound database sits at the center of the OVN
11 architecture. It is the one component that speaks both southbound
12 directly to all the hypervisors and gateways, via
88058f19
AW
13 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>, and
14 northbound to the Cloud Management System, via <code>ovn-northd</code>:
fe36184b
BP
15 </p>
16
17 <h2>Database Structure</h2>
18
19 <p>
ec78987f
JP
20 The OVN Southbound database contains three classes of data with
21 different properties, as described in the sections below.
fe36184b
BP
22 </p>
23
24 <h3>Physical Network (PN) data</h3>
25
26 <p>
27 PN tables contain information about the chassis nodes in the system. This
28 contains all the information necessary to wire the overlay, such as IP
29 addresses, supported tunnel types, and security keys.
30 </p>
31
32 <p>
33 The amount of PN data is small (O(n) in the number of chassis) and it
34 changes infrequently, so it can be replicated to every chassis.
35 </p>
36
37 <p>
62fdd819 38 The <ref table="Chassis"/> table comprises the PN tables.
fe36184b
BP
39 </p>
40
41 <h3>Logical Network (LN) data</h3>
42
43 <p>
44 LN tables contain the topology of logical switches and routers, ACLs,
45 firewall rules, and everything needed to describe how packets traverse a
46 logical network, represented as logical datapath flows (see Logical
47 Datapath Flows, below).
48 </p>
49
50 <p>
51 LN data may be large (O(n) in the number of logical ports, ACL rules,
52 etc.). Thus, to improve scaling, each chassis should receive only data
53 related to logical networks in which that chassis participates. Past
54 experience shows that in the presence of large logical networks, even
55 finer-grained partitioning of data, e.g. designing logical flows so that
56 only the chassis hosting a logical port needs related flows, pays off
57 scale-wise. (This is not necessary initially but it is worth bearing in
58 mind in the design.)
59 </p>
60
61 <p>
62 The LN is a slave of the cloud management system running northbound of OVN.
63 That CMS determines the entire OVN logical configuration and therefore the
64 LN's content at any given time is a deterministic function of the CMS's
09986f8c
JP
65 configuration, although that happens indirectly via the
66 <ref db="OVN_Northbound"/> database and <code>ovn-northd</code>.
fe36184b
BP
67 </p>
68
69 <p>
70 LN data is likely to change more quickly than PN data. This is especially
71 true in a container environment where VMs are created and destroyed (and
72 therefore added to and deleted from logical switches) quickly.
73 </p>
74
75 <p>
5868eb24
BP
76 <ref table="Logical_Flow"/> and <ref table="Multicast_Group"/> contain LN
77 data.
fe36184b
BP
78 </p>
79
80 <h3>Bindings data</h3>
81
82 <p>
5868eb24
BP
83 Bindings data link logical and physical components. They show the current
84 placement of logical components (such as VMs and VIFs) onto chassis, and
85 map logical entities to the values that represent them in tunnel
86 encapsulations.
fe36184b
BP
87 </p>
88
89 <p>
90 Bindings change frequently, at least every time a VM powers up or down
91 or migrates, and especially quickly in a container environment. The
92 amount of data per VM (or VIF) is small.
93 </p>
94
95 <p>
96 Each chassis is authoritative about the VMs and VIFs that it hosts at any
97 given time and can efficiently flood that state to a central location, so
98 the consistency needs are minimal.
99 </p>
100
101 <p>
5868eb24
BP
102 The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/> tables
103 contain binding data.
fe36184b
BP
104 </p>
105
5868eb24
BP
106 <h2>Common Columns</h2>
107
108 <p>
109 Some tables contain a special column named <code>external_ids</code>. This
110 column has the same form and purpose each place that it appears, so we
111 describe it here to save space later.
112 </p>
113
114 <dl>
115 <dt><code>external_ids</code>: map of string-string pairs</dt>
116 <dd>
117 Key-value pairs for use by the software that manages the OVN Southbound
88058f19
AW
118 database rather than by
119 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>. In
120 particular, <code>ovn-northd</code> can use key-value pairs in this
121 column to relate entities in the southbound database to higher-level
122 entities (such as entities in the OVN Northbound database). Individual
123 key-value pairs in this column may be documented in some cases to aid
124 in understanding and troubleshooting, but the reader should not mistake
125 such documentation as comprehensive.
5868eb24
BP
126 </dd>
127 </dl>
128
fe36184b
BP
129 <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
130 <p>
131 Each row in this table represents a hypervisor or gateway (a chassis) in
132 the physical network (PN). Each chassis, via
88058f19
AW
133 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>, adds
134 and updates its own row, and keeps a copy of the remaining rows to
135 determine how to reach other hypervisors.
fe36184b
BP
136 </p>
137
138 <p>
139 When a chassis shuts down gracefully, it should remove its own row.
140 (This is not critical because resources hosted on the chassis are equally
141 unreachable regardless of whether the row is present.) If a chassis
142 shuts down permanently without removing its row, some kind of manual or
143 automatic cleanup is eventually needed; we can devise a process for that
144 as necessary.
145 </p>
146
147 <column name="name">
148 A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
149 column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
150 database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
151 not prescribe a particular format for chassis names.
152 </column>
153
09db214c 154 <group title="Encapsulation Configuration">
fe36184b 155 <p>
09db214c
JP
156 OVN uses encapsulation to transmit logical dataplane packets
157 between chassis.
fe36184b
BP
158 </p>
159
09db214c
JP
160 <column name="encaps">
161 Points to supported encapsulation configurations to transmit
162 logical dataplane packets to this chassis. Each entry is a <ref
163 table="Encap"/> record that describes the configuration.
fe36184b
BP
164 </column>
165 </group>
166
62fdd819
AW
167 <group title="Gateway Configuration">
168 <p>
169 A <dfn>gateway</dfn> is a chassis that forwards traffic between the
170 OVN-managed part of a logical network and a physical VLAN, extending a
171 tunnel-based logical network into a physical network. Gateways are
88058f19
AW
172 typically dedicated nodes that do not host VMs and will be controlled
173 by <code>ovn-controller-vtep</code>.
fe36184b
BP
174 </p>
175
62fdd819 176 <column name="vtep_logical_switches">
88058f19
AW
177 Stores all VTEP logical switch names connected by this gateway
178 chassis. The <ref table="Port_Binding"/> table entry with
179 <ref column="options" table="Port_Binding"/>:<code>vtep-physical-switch</code>
180 equal <ref table="Chassis"/> <ref column="name" table="Chassis"/>, and
181 <ref column="options" table="Port_Binding"/>:<code>vtep-logical-switch</code>
182 value in <ref table="Chassis"/>
183 <ref column="vtep_logical_switches" table="Chassis"/>, will be
184 associated with this <ref table="Chassis"/>.
fe36184b 185 </column>
62fdd819 186 </group>
fe36184b
BP
187 </table>
188
09db214c
JP
189 <table name="Encap" title="Encapsulation Types">
190 <p>
191 The <ref column="encaps" table="Chassis"/> column in the <ref
192 table="Chassis"/> table refers to rows in this table to identify
193 how OVN may transmit logical dataplane packets to this chassis.
88058f19
AW
194 Each chassis, via <code>ovn-controller</code>(8) or
195 <code>ovn-controller-vtep</code>(8), adds and updates its own rows
196 and keeps a copy of the remaining rows to determine how to reach
197 other chassis.
09db214c
JP
198 </p>
199
200 <column name="type">
201 The encapsulation to use to transmit packets to this chassis.
b705f9ea
JP
202 Hypervisors must use either <code>geneve</code> or
203 <code>stt</code>. Gateways may use <code>vxlan</code>,
204 <code>geneve</code>, or <code>stt</code>.
09db214c
JP
205 </column>
206
207 <column name="options">
208 Options for configuring the encapsulation, e.g. IPsec parameters when
209 IPsec support is introduced. No options are currently defined.
210 </column>
211
212 <column name="ip">
213 The IPv4 address of the encapsulation tunnel endpoint.
214 </column>
215 </table>
216
5868eb24 217 <table name="Logical_Flow" title="Logical Network Flows">
fe36184b 218 <p>
09986f8c
JP
219 Each row in this table represents one logical flow.
220 <code>ovn-northd</code> populates this table with logical flows
221 that implement the L2 and L3 topologies specified in the
222 <ref db="OVN_Northbound"/> database. Each hypervisor, via
223 <code>ovn-controller</code>, translates the logical flows into
224 OpenFlow flows specific to its hypervisor and installs them into
225 Open vSwitch.
fe36184b
BP
226 </p>
227
228 <p>
229 Logical flows are expressed in an OVN-specific format, described here. A
230 logical datapath flow is much like an OpenFlow flow, except that the
231 flows are written in terms of logical ports and logical datapaths instead
232 of physical ports and physical datapaths. Translation between logical
233 and physical flows helps to ensure isolation between logical datapaths.
09986f8c
JP
234 (The logical flow abstraction also allows the OVN centralized
235 components to do less work, since they do not have to separately
236 compute and push out physical flows to each chassis.)
fe36184b
BP
237 </p>
238
239 <p>
240 The default action when no flow matches is to drop packets.
241 </p>
242
5868eb24
BP
243 <p><em>Logical Life Cycle of a Packet</em></p>
244
245 <p>
246 This following description focuses on the life cycle of a packet through
247 a logical datapath, ignoring physical details of the implementation.
248 Please refer to <em>Life Cycle of a Packet</em> in
249 <code>ovn-architecture</code>(7) for the physical information.
250 </p>
251
252 <p>
253 The description here is written as if OVN itself executes these steps,
254 but in fact OVN (that is, <code>ovn-controller</code>) programs Open
255 vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
256 </p>
257
258 <p>
259 At a high level, OVN passes each packet through the logical datapath's
260 logical ingress pipeline, which may output the packet to one or more
261 logical port or logical multicast groups. For each such logical output
262 port, OVN passes the packet through the datapath's logical egress
263 pipeline, which may either drop the packet or deliver it to the
264 destination. Between the two pipelines, outputs to logical multicast
265 groups are expanded into logical ports, so that the egress pipeline only
266 processes a single logical output port at a time. Between the two
267 pipelines is also where, when necessary, OVN encapsulates a packet in a
268 tunnel (or tunnels) to transmit to remote hypervisors.
269 </p>
270
271 <p>
272 In more detail, to start, OVN searches the <ref table="Logical_Flow"/>
273 table for a row with correct <ref column="logical_datapath"/>, a <ref
274 column="pipeline"/> of <code>ingress</code>, a <ref column="table_id"/>
275 of 0, and a <ref column="match"/> that is true for the packet. If none
276 is found, OVN drops the packet. If OVN finds more than one, it chooses
277 the match with the highest <ref column="priority"/>. Then OVN executes
278 each of the actions specified in the row's <ref table="actions"/> column,
279 in the order specified. Some actions, such as those to modify packet
280 headers, require no further details. The <code>next</code> and
281 <code>output</code> actions are special.
282 </p>
283
284 <p>
285 The <code>next</code> action causes the above process to be repeated
286 recursively, except that OVN searches for <ref column="table_id"/> of 1
287 instead of 0. Similarly, any <code>next</code> action in a row found in
288 that table would cause a further search for a <ref column="table_id"/> of
289 2, and so on. When recursive processing completes, flow control returns
290 to the action following <code>next</code>.
291 </p>
292
293 <p>
294 The <code>output</code> action also introduces recursion. Its effect
295 depends on the current value of the <code>outport</code> field. Suppose
296 <code>outport</code> designates a logical port. First, OVN compares
297 <code>inport</code> to <code>outport</code>; if they are equal, it treats
298 the <code>output</code> as a no-op. In the common case, where they are
299 different, the packet enters the egress pipeline. This transition to the
300 egress pipeline discards register data, e.g. <code>reg0</code>
301 ... <code>reg5</code>, to achieve uniform behavior regardless of whether
302 the egress pipeline is on a different hypervisor (because registers
303 aren't preserve across tunnel encapsulation).
304 </p>
305
306 <p>
307 To execute the egress pipeline, OVN again searches the <ref
308 table="Logical_Flow"/> table for a row with correct <ref
309 column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
310 column="match"/> that is true for the packet, but now looking for a <ref
311 column="pipeline"/> of <code>egress</code>. If no matching row is found,
312 the output becomes a no-op. Otherwise, OVN executes the actions for the
313 matching flow (which is chosen from multiple, if necessary, as already
314 described).
315 </p>
316
317 <p>
318 In the <code>egress</code> pipeline, the <code>next</code> action acts as
319 already described, except that it, of course, searches for
320 <code>egress</code> flows. The <code>output</code> action, however, now
321 directly outputs the packet to the output port (which is now fixed,
322 because <code>outport</code> is read-only within the egress pipeline).
323 </p>
324
325 <p>
326 The description earlier assumed that <code>outport</code> referred to a
327 logical port. If it instead designates a logical multicast group, then
328 the description above still applies, with the addition of fan-out from
329 the logical multicast group to each logical port in the group. For each
330 member of the group, OVN executes the logical pipeline as described, with
331 the logical output port replaced by the group member.
332 </p>
333
8d6e5516
JP
334 <p><em>Pipeline Stages</em></p>
335
336 <p>
337 <code>ovn-northd</code> is responsible for populating the
338 <ref table="Logical_Flow"/> table, so the stages are an
339 implementation detail and subject to change. This section
340 describes the current logical flow table.
341 </p>
342
343 <p>
344 The ingress pipeline consists of the following stages:
345 </p>
346 <ul>
347 <li>
348 Port Security (Table 0): Validates the source address, drops
349 packets with a VLAN tag, and, if configured, verifies that the
350 logical port is allowed to send with the source address.
351 </li>
352
353 <li>
354 L2 Destination Lookup (Table 1): Forwards known unicast
355 addresses to the appropriate logical port. Unicast packets to
356 unknown hosts are forwarded to logical ports configured with the
357 special <code>unknown</code> mac address. Broadcast, and
358 multicast are flooded to all ports in the logical switch.
359 </li>
360 </ul>
361
362 <p>
363 The egress pipeline consists of the following stages:
364 </p>
365 <ul>
366 <li>
367 ACL (Table 0): Applies any specified access control lists.
368 </li>
369
370 <li>
371 Port Security (Table 1): If configured, verifies that the
372 logical port is allowed to receive packets with the destination
373 address.
374 </li>
375 </ul>
376
747b2a45 377 <column name="logical_datapath">
5868eb24
BP
378 The logical datapath to which the logical flow belongs.
379 </column>
380
381 <column name="pipeline">
382 <p>
383 The primary flows used for deciding on a packet's destination are the
384 <code>ingress</code> flows. The <code>egress</code> flows implement
385 ACLs. See <em>Logical Life Cycle of a Packet</em>, above, for details.
386 </p>
747b2a45
BP
387 </column>
388
fe36184b
BP
389 <column name="table_id">
390 The stage in the logical pipeline, analogous to an OpenFlow table number.
391 </column>
392
393 <column name="priority">
394 The flow's priority. Flows with numerically higher priority take
395 precedence over those with lower. If two logical datapath flows with the
396 same priority both match, then the one actually applied to the packet is
397 undefined.
398 </column>
399
400 <column name="match">
401 <p>
402 A matching expression. OVN provides a superset of OpenFlow matching
403 capabilities, using a syntax similar to Boolean expressions in a
404 programming language.
405 </p>
406
407 <p>
fa6aeaeb
RB
408 The most important components of match expression are
409 <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
410 <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
411 <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
412 0x800</code>. The logical AND operator <code>&amp;&amp;</code> and
413 logical OR operator <code>||</code> can combine comparisons into a
414 larger expression.
fe36184b
BP
415 </p>
416
fe36184b 417 <p>
e0840f11
BP
418 Matching expressions also support parentheses for grouping, the logical
419 NOT prefix operator <code>!</code>, and literals <code>0</code> and
420 <code>1</code> to express ``false'' or ``true,'' respectively. The
421 latter is useful by itself as a catch-all expression that matches every
422 packet.
fe36184b
BP
423 </p>
424
e0840f11 425 <p><em>Symbols</em></p>
fe36184b
BP
426
427 <p>
fa6aeaeb
RB
428 <em>Type</em>. Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
429 type. Integer symbols have a <dfn>width</dfn> in bits.
fe36184b
BP
430 </p>
431
432 <p>
fa6aeaeb 433 <em>Kinds</em>. There are three kinds of symbols:
fe36184b
BP
434 </p>
435
e0840f11 436 <ul>
fa6aeaeb
RB
437 <li>
438 <p>
439 <dfn>Fields</dfn>. A field symbol represents a packet header or
440 metadata field. For example, a field
441 named <code>vlan.tci</code> might represent the VLAN TCI field in a
442 packet.
443 </p>
444
445 <p>
446 A field symbol can have integer or string type. Integer fields can
447 be nominal or ordinal (see <em>Level of Measurement</em>,
448 below).
449 </p>
450 </li>
451
452 <li>
453 <p>
454 <dfn>Subfields</dfn>. A subfield represents a subset of bits from
455 a larger field. For example, a field <code>vlan.vid</code> might
456 be defined as an alias for <code>vlan.tci[0..11]</code>. Subfields
457 are provided for syntactic convenience, because it is always
458 possible to instead refer to a subset of bits from a field
459 directly.
460 </p>
461
462 <p>
463 Only ordinal fields (see <em>Level of Measurement</em>,
464 below) may have subfields. Subfields are always ordinal.
465 </p>
466 </li>
467
468 <li>
469 <p>
470 <dfn>Predicates</dfn>. A predicate is shorthand for a Boolean
471 expression. Predicates may be used much like 1-bit fields. For
472 example, <code>ip4</code> might expand to <code>eth.type ==
473 0x800</code>. Predicates are provided for syntactic convenience,
474 because it is always possible to instead specify the underlying
475 expression directly.
476 </p>
477
478 <p>
479 A predicate whose expansion refers to any nominal field or
480 predicate (see <em>Level of Measurement</em>, below) is nominal;
481 other predicates have Boolean level of measurement.
482 </p>
483 </li>
e0840f11
BP
484 </ul>
485
fe36184b 486 <p>
fa6aeaeb
RB
487 <em>Level of Measurement</em>. See
488 http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
489 concept on which this classification is based. There are three
490 levels:
fe36184b
BP
491 </p>
492
493 <ul>
fa6aeaeb
RB
494 <li>
495 <p>
496 <dfn>Ordinal</dfn>. In statistics, ordinal values can be ordered
497 on a scale. OVN considers a field (or subfield) to be ordinal if
498 its bits can be examined individually. This is true for the
499 OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
500 </p>
501
502 <p>
503 Any use of a nominal field may specify a single bit or a range of
504 bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
505 within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
506 multicast bit in the Ethernet destination address.
507 </p>
508
509 <p>
510 OVN supports all the usual arithmetic relations (<code>==</code>,
511 <code>!=</code>, <code>&lt;</code>, <code>&lt;=</code>,
512 <code>&gt;</code>, and <code>&gt;=</code>) on ordinal fields and
513 their subfields, because OVN can implement these in OpenFlow and
514 Open vSwitch as collections of bitwise tests.
515 </p>
516 </li>
517
518 <li>
519 <p>
520 <dfn>Nominal</dfn>. In statistics, nominal values cannot be
521 usefully compared except for equality. This is true of OpenFlow
522 port numbers, Ethernet types, and IP protocols are examples: all of
523 these are just identifiers assigned arbitrarily with no deeper
524 meaning. In OpenFlow and Open vSwitch, bits in these fields
525 generally aren't individually addressable.
526 </p>
527
528 <p>
529 OVN only supports arithmetic tests for equality on nominal fields,
530 because OpenFlow and Open vSwitch provide no way for a flow to
531 efficiently implement other comparisons on them. (A test for
532 inequality can be sort of built out of two flows with different
533 priorities, but OVN matching expressions always generate flows with
534 a single priority.)
535 </p>
536
537 <p>
538 String fields are always nominal.
539 </p>
540 </li>
541
542 <li>
543 <p>
544 <dfn>Boolean</dfn>. A nominal field that has only two values, 0
545 and 1, is somewhat exceptional, since it is easy to support both
546 equality and inequality tests on such a field: either one can be
547 implemented as a test for 0 or 1.
548 </p>
549
550 <p>
551 Only predicates (see above) have a Boolean level of measurement.
552 </p>
553
554 <p>
555 This isn't a standard level of measurement.
556 </p>
557 </li>
fe36184b
BP
558 </ul>
559
560 <p>
fa6aeaeb
RB
561 <em>Prerequisites</em>. Any symbol can have prerequisites, which are
562 additional condition implied by the use of the symbol. For example,
563 For example, <code>icmp4.type</code> symbol might have prerequisite
564 <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
565 0</code> to be interpreted as <code>icmp4.type == 0 &amp;&amp;
566 icmp4</code>, which would in turn expand to <code>icmp4.type == 0
567 &amp;&amp; eth.type == 0x800 &amp;&amp; ip4.proto == 1</code> (assuming
568 <code>icmp4</code> is a predicate defined as suggested under
569 <em>Types</em> above).
fe36184b
BP
570 </p>
571
e0840f11
BP
572 <p><em>Relational operators</em></p>
573
fe36184b 574 <p>
fa6aeaeb
RB
575 All of the standard relational operators <code>==</code>,
576 <code>!=</code>, <code>&lt;</code>, <code>&lt;=</code>,
577 <code>&gt;</code>, and <code>&gt;=</code> are supported. Nominal
578 fields support only <code>==</code> and <code>!=</code>, and only in a
579 positive sense when outer <code>!</code> are taken into account,
580 e.g. given string field <code>inport</code>, <code>inport ==
581 "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
582 not <code>inport != "eth0"</code>.
fe36184b
BP
583 </p>
584
585 <p>
fa6aeaeb
RB
586 The implementation of <code>==</code> (or <code>!=</code> when it is
587 negated), is more efficient than that of the other relational
588 operators.
fe36184b
BP
589 </p>
590
e0840f11
BP
591 <p><em>Constants</em></p>
592
fe36184b 593 <p>
e0840f11
BP
594 Integer constants may be expressed in decimal, hexadecimal prefixed by
595 <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
596 their standard forms, or Ethernet addresses as colon-separated hex
597 digits. A constant in any of these forms may be followed by a slash
598 and a second constant (the mask) in the same form, to form a masked
599 constant. IPv4 and IPv6 masks may be given as integers, to express
600 CIDR prefixes.
601 </p>
602
603 <p>
604 String constants have the same syntax as quoted strings in JSON (thus,
5868eb24 605 they are Unicode strings).
fe36184b
BP
606 </p>
607
608 <p>
e0840f11
BP
609 Some operators support sets of constants written inside curly braces
610 <code>{</code> ... <code>}</code>. Commas between elements of a set,
611 and after the last elements, are optional. With <code>==</code>,
612 ``<code><var>field</var> == { <var>constant1</var>,
613 <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
614 for ``<code><var>field</var> == <var>constant1</var> ||
615 <var>field</var> == <var>constant2</var> || </code>...<code></code>.
616 Similarly, ``<code><var>field</var> != { <var>constant1</var>,
617 <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
618 ``<code><var>field</var> != <var>constant1</var> &amp;&amp;
fe36184b 619 <var>field</var> != <var>constant2</var> &amp;&amp;
e0840f11 620 </code>...<code></code>''.
fe36184b
BP
621 </p>
622
e0840f11
BP
623 <p><em>Miscellaneous</em></p>
624
fe36184b 625 <p>
fa6aeaeb
RB
626 Comparisons may name the symbol or the constant first,
627 e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
628 acceptable.
fe36184b
BP
629 </p>
630
631 <p>
fa6aeaeb
RB
632 Tests for a range may be expressed using a syntax like <code>1024 &lt;=
633 tcp.src &lt;= 49151</code>, which is equivalent to <code>1024 &lt;=
634 tcp.src &amp;&amp; tcp.src &lt;= 49151</code>.
fe36184b
BP
635 </p>
636
637 <p>
fa6aeaeb
RB
638 For a one-bit field or predicate, a mention of its name is equivalent
639 to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
640 is equivalent to <code>vlan.present == 1</code>. The same is true for
641 one-bit subfields, e.g. <code>vlan.tci[12]</code>. There is no
642 technical limitation to implementing the same for ordinal fields of all
643 widths, but the implementation is expensive enough that the syntax
644 parser requires writing an explicit comparison against zero to make
645 mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
646 against 0 is required.
fe36184b
BP
647 </p>
648
649 <p>
fa6aeaeb
RB
650 <em>Operator precedence</em> is as shown below, from highest to lowest.
651 There are two exceptions where parentheses are required even though the
652 table would suggest that they are not: <code>&amp;&amp;</code> and
653 <code>||</code> require parentheses when used together, and
654 <code>!</code> requires parentheses when applied to a relational
655 expression. Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
656 &amp;&amp; ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
657 parentheses are mandatory.
fe36184b
BP
658 </p>
659
e0840f11
BP
660 <ul>
661 <li><code>()</code></li>
662 <li><code>== != &lt; &lt;= &gt; &gt;=</code></li>
663 <li><code>!</code></li>
664 <li><code>&amp;&amp; ||</code></li>
665 </ul>
666
10b1662b
BP
667 <p>
668 <em>Comments</em> may be introduced by <code>//</code>, which extends
669 to the next new-line. Comments within a line may be bracketed by
670 <code>/*</code> and <code>*/</code>. Multiline comments are not
671 supported.
672 </p>
673
e0840f11
BP
674 <p><em>Symbols</em></p>
675
5868eb24
BP
676 <p>
677 Most of the symbols below have integer type. Only <code>inport</code>
678 and <code>outport</code> have string type. <code>inport</code> names a
679 logical port. Thus, its value is a <ref column="logical_port"/> name
62fdd819
AW
680 from the <ref table="Port_Binding"/> table. <code>outport</code> may
681 name a logical port, as <code>inport</code>, or a logical multicast
682 group defined in the <ref table="Multicast_Group"/> table. For both
683 symbols, only names within the flow's logical datapath may be used.
5868eb24
BP
684 </p>
685
e0840f11 686 <ul>
5868eb24
BP
687 <li><code>reg0</code>...<code>reg5</code></li>
688 <li><code>inport</code> <code>outport</code></li>
e0840f11
BP
689 <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
690 <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
691 <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
692 <li><code>ip4.src</code> <code>ip4.dst</code></li>
693 <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
694 <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
695 <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
696 <li><code>udp.src</code> <code>udp.dst</code></li>
697 <li><code>sctp.src</code> <code>sctp.dst</code></li>
698 <li><code>icmp4.type</code> <code>icmp4.code</code></li>
699 <li><code>icmp6.type</code> <code>icmp6.code</code></li>
700 <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
701 </ul>
702
25030d47
RB
703 <p>
704 The following predicates are supported:
705 </p>
706
707 <ul>
708 <li><code>vlan.present</code> expands to <code>vlan.tci[12]</code></li>
709 <li><code>ip4</code> expands to <code>eth.type == 0x800</code></li>
710 <li><code>ip6</code> expands to <code>eth.type == 0x86dd</code></li>
711 <li><code>ip</code> expands to <code>ip4 || ip6</code></li>
712 <li><code>icmp4</code> expands to <code>ip4 &amp;&amp; ip.proto == 1</code></li>
713 <li><code>icmp6</code> expands to <code>ip6 &amp;&amp; ip.proto == 58</code></li>
714 <li><code>icmp</code> expands to <code>icmp4 || icmp6</code></li>
715 <li><code>ip.is_frag</code> expands to <code>ip.frag[0]</code></li>
716 <li><code>ip.later_frag</code> expands to <code>ip.frag[1]</code></li>
717 <li><code>ip.first_frag</code> expands to <code>ip.is_frag &amp;&amp; !ip.later_frag</code></li>
718 <li><code>arp</code> expands to <code>eth.type == 0x806</code></li>
719 <li><code>nd</code> expands to <code>icmp6.type == {135, 136} &amp;&amp; icmp6.code == 0</code></li>
720 <li><code>tcp</code> expands to <code>ip.proto == 6</code></li>
721 <li><code>udp</code> expands to <code>ip.proto == 17</code></li>
722 <li><code>sctp</code> expands to <code>ip.proto == 132</code></li>
723 </ul>
fe36184b
BP
724 </column>
725
726 <column name="actions">
727 <p>
2cd87fce
RB
728 Logical datapath actions, to be executed when the logical flow
729 represented by this row is the highest-priority match.
fe36184b
BP
730 </p>
731
35060cdc 732 <p>
2cd87fce
RB
733 Actions share lexical syntax with the <ref column="match"/> column. An
734 empty set of actions (or one that contains just white space or
735 comments), or a set of actions that consists of just
736 <code>drop;</code>, causes the matched packets to be dropped.
737 Otherwise, the column should contain a sequence of actions, each
738 terminated by a semicolon.
35060cdc 739 </p>
fe36184b 740
35060cdc 741 <p>
eee7a8ed 742 The following actions are defined:
35060cdc 743 </p>
fe36184b 744
35060cdc
BP
745 <dl>
746 <dt><code>output;</code></dt>
747 <dd>
5868eb24 748 <p>
eee7a8ed
JP
749 In the ingress pipeline, this action executes the
750 <code>egress</code> pipeline as a subroutine. If
751 <code>outport</code> names a logical port, the egress pipeline
752 executes once; if it is a multicast group, the egress pipeline runs
753 once for each logical port in the group.
5868eb24
BP
754 </p>
755
756 <p>
757 In the egress pipeline, this action performs the actual
758 output to the <code>outport</code> logical port. (In the egress
759 pipeline, <code>outport</code> never names a multicast group.)
760 </p>
761
762 <p>
763 Output to the input port is implicitly dropped, that is,
764 <code>output</code> becomes a no-op if <code>outport</code> ==
765 <code>inport</code>.
766 </p>
eee7a8ed 767 </dd>
fe36184b 768
35060cdc 769 <dt><code>next;</code></dt>
558ec83d 770 <dt><code>next(<var>table</var>);</code></dt>
35060cdc 771 <dd>
558ec83d
BP
772 Executes another logical datapath table as a subroutine. By default,
773 the table after the current one is executed. Specify
774 <var>table</var> to jump to a specific table in the same pipeline.
2cd87fce 775 </dd>
fe36184b 776
35060cdc
BP
777 <dt><code><var>field</var> = <var>constant</var>;</code></dt>
778 <dd>
5868eb24 779 <p>
5ee054fb
BP
780 Sets data or metadata field <var>field</var> to constant value
781 <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
782 logical output port. To set only a subset of bits in a field,
783 specify a subfield for <var>field</var> or a masked
784 <var>constant</var>, e.g. one may use <code>vlan.pcp[2] = 1;</code>
785 or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit of
786 the VLAN PCP.
5868eb24
BP
787 </p>
788
789 <p>
790 Assigning to a field with prerequisites implicitly adds those
791 prerequisites to <ref column="match"/>; thus, for example, a flow
792 that sets <code>tcp.dst</code> applies only to TCP flows,
793 regardless of whether its <ref column="match"/> mentions any TCP
794 field.
795 </p>
796
797 <p>
798 Not all fields are modifiable (e.g. <code>eth.type</code> and
799 <code>ip.proto</code> are read-only), and not all modifiable fields
800 may be partially modified (e.g. <code>ip.ttl</code> must assigned
801 as a whole). The <code>outport</code> field is modifiable in the
802 <code>ingress</code> pipeline but not in the <code>egress</code>
803 pipeline.
804 </p>
eee7a8ed 805 </dd>
5ee054fb
BP
806
807 <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
808 <dd>
809 <p>
810 Sets data or metadata field <var>field1</var> to the value of data
811 or metadata field <var>field2</var>, e.g. <code>reg0 =
812 ip4.src;</code> copies <code>ip4.src</code> into <code>reg0</code>.
813 To modify only a subset of a field's bits, specify a subfield for
814 <var>field1</var> or <var>field2</var> or both, e.g. <code>vlan.pcp
815 = reg0[0..2];</code> copies the least-significant bits of
816 <code>reg0</code> into the VLAN PCP.
817 </p>
818
819 <p>
820 <var>field1</var> and <var>field2</var> must be the same type,
821 either both string or both integer fields. If they are both
822 integer fields, they must have the same width.
823 </p>
824
825 <p>
826 If <var>field1</var> or <var>field2</var> has prerequisites, they
827 are added implicitly to <ref column="match"/>. It is possible to
828 write an assignment with contradictory prerequisites, such as
829 <code>ip4.src = ip6.src[0..31];</code>, but the contradiction means
830 that a logical flow with such an assignment will never be matched.
831 </p>
832 </dd>
a20c96c6
BP
833
834 <dt><code><var>field1</var> &lt;-&gt; <var>field2</var>;</code></dt>
835 <dd>
836 <p>
837 Similar to <code><var>field1</var> = <var>field2</var>;</code>
838 except that the two values are exchanged instead of copied. Both
839 <var>field1</var> and <var>field2</var> must modifiable.
840 </p>
841 </dd>
fe36184b
BP
842 </dl>
843
844 <p>
2cd87fce
RB
845 The following actions will likely be useful later, but they have not
846 been thought out carefully.
fe36184b
BP
847 </p>
848
849 <dl>
e0840f11 850 <dt><code>learn</code></dt>
fe36184b 851
e0840f11 852 <dt><code>conntrack</code></dt>
fe36184b 853
35060cdc 854 <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>};</code></dt>
e0840f11
BP
855 <dd>
856 decrement TTL; execute first set of actions if
857 successful, second set if TTL decrement fails
858 </dd>
fe36184b 859
35060cdc 860 <dt><code>icmp_reply { <var>action</var>, </code>...<code> };</code></dt>
e0840f11 861 <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
fe36184b 862
fa6aeaeb
RB
863 <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
864 <dd>generate ARP from packet, execute <var>action</var>s</dd>
fe36184b 865 </dl>
fe36184b 866 </column>
091e3af9
JP
867
868 <column name="external_ids" key="stage-name">
869 Human-readable name for this flow's stage in the pipeline.
870 </column>
871
872 <group title="Common Columns">
873 The overall purpose of these columns is described under <code>Common
874 Columns</code> at the beginning of this document.
875
876 <column name="external_ids"/>
877 </group>
fe36184b
BP
878 </table>
879
5868eb24
BP
880 <table name="Multicast_Group" title="Logical Port Multicast Groups">
881 <p>
882 The rows in this table define multicast groups of logical ports.
883 Multicast groups allow a single packet transmitted over a tunnel to a
884 hypervisor to be delivered to multiple VMs on that hypervisor, which
885 uses bandwidth more efficiently.
886 </p>
887
888 <p>
889 Each row in this table defines a logical multicast group numbered <ref
890 column="tunnel_key"/> within <ref column="datapath"/>, whose logical
891 ports are listed in the <ref column="ports"/> column.
892 </p>
893
894 <column name="datapath">
895 The logical datapath in which the multicast group resides.
896 </column>
897
898 <column name="tunnel_key">
899 The value used to designate this logical egress port in tunnel
900 encapsulations. An index forces the key to be unique within the <ref
901 column="datapath"/>. The unusual range ensures that multicast group IDs
902 do not overlap with logical port IDs.
903 </column>
904
905 <column name="name">
906 <p>
907 The logical multicast group's name. An index forces the name to be
908 unique within the <ref column="datapath"/>. Logical flows in the
909 ingress pipeline may output to the group just as for individual logical
910 ports, by assigning the group's name to <code>outport</code> and
911 executing an <code>output</code> action.
912 </p>
913
914 <p>
915 Multicast group names and logical port names share a single namespace
916 and thus should not overlap (but the database schema cannot enforce
917 this). To try to avoid conflicts, <code>ovn-northd</code> uses names
918 that begin with <code>_MC_</code>.
919 </p>
920 </column>
921
922 <column name="ports">
923 The logical ports included in the multicast group. All of these ports
924 must be in the <ref column="datapath"/> logical datapath (but the
925 database schema cannot enforce this).
926 </column>
927 </table>
928
929 <table name="Datapath_Binding" title="Physical-Logical Datapath Bindings">
930 <p>
931 Each row in this table identifies physical bindings of a logical
932 datapath. A logical datapath implements a logical pipeline among the
933 ports in the <ref table="Port_Binding"/> table associated with it. In
934 practice, the pipeline in a given logical datapath implements either a
935 logical switch or a logical router.
936 </p>
937
938 <column name="tunnel_key">
939 The tunnel key value to which the logical datapath is bound.
940 The <code>Tunnel Encapsulation</code> section in
941 <code>ovn-architecture</code>(7) describes how tunnel keys are
942 constructed for each supported encapsulation.
943 </column>
944
945 <column name="external_ids" key="logical-switch" type='{"type": "uuid"}'>
946 Each row in <ref table="Datapath_Binding"/> is associated with some
947 logical datapath. <code>ovn-northd</code> uses this key to store the
948 UUID of the logical datapath <ref table="Logical_Switch"
949 db="OVN_Northbound"/> row in the <ref db="OVN_Northbound"/> database.
950 </column>
951
952 <group title="Common Columns">
953 The overall purpose of these columns is described under <code>Common
954 Columns</code> at the beginning of this document.
955
956 <column name="external_ids"/>
957 </group>
958 </table>
959
dcda6e0d 960 <table name="Port_Binding" title="Physical-Logical Port Bindings">
fe36184b
BP
961 <p>
962 Each row in this table identifies the physical location of a logical
9fb4636f 963 port.
fe36184b
BP
964 </p>
965
966 <p>
9fb4636f 967 For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
91ae2065
RB
968 database, <code>ovn-northd</code> creates a record in this table.
969 <code>ovn-northd</code> populates and maintains every column except
3213e9df 970 the <code>chassis</code> column, which it leaves empty in new records.
9fb4636f
GS
971 </p>
972
973 <p>
88058f19
AW
974 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>
975 populates the <code>chassis</code> column for the records that
976 identify the logical ports that are located on its hypervisor/gateway,
977 which <code>ovn-controller</code>/<code>ovn-controller-vtep</code> in
978 turn finds out by monitoring the local hypervisor's Open_vSwitch
979 database, which identifies logical ports via the conventions described
980 in <code>IntegrationGuide.md</code>.
9fb4636f
GS
981 </p>
982
983 <p>
5868eb24 984 When a chassis shuts down gracefully, it should clean up the
9fb4636f 985 <code>chassis</code> column that it previously had populated.
fe36184b
BP
986 (This is not critical because resources hosted on the chassis are equally
987 unreachable regardless of whether their rows are present.) To handle the
988 case where a VM is shut down abruptly on one chassis, then brought up
88058f19
AW
989 again on a different one,
990 <code>ovn-controller</code>/<code>ovn-controller-vtep</code> must
991 overwrite the <code>chassis</code> column with new information.
fe36184b
BP
992 </p>
993
c96ba502
BP
994 <group title="Core Features">
995 <column name="datapath">
996 The logical datapath to which the logical port belongs.
997 </column>
1a76c93e 998
c96ba502
BP
999 <column name="logical_port">
1000 A logical port, taken from <ref table="Logical_Port" column="name"
1001 db="OVN_Northbound"/> in the OVN_Northbound database's <ref
1002 table="Logical_Port" db="OVN_Northbound"/> table. OVN does not
1003 prescribe a particular format for the logical port ID.
1004 </column>
c0281929 1005
c96ba502
BP
1006 <column name="chassis">
1007 The physical location of the logical port. To successfully identify a
1008 chassis, this column must be a <ref table="Chassis"/> record. This is
1009 populated by
1010 <code>ovn-controller</code>/<code>ovn-controller-vtep</code>.
1011 </column>
c0281929 1012
c96ba502
BP
1013 <column name="tunnel_key">
1014 <p>
1015 A number that represents the logical port in the key (e.g. STT key or
1016 Geneve TLV) field carried within tunnel protocol packets.
1017 </p>
c0281929 1018
c96ba502
BP
1019 <p>
1020 The tunnel ID must be unique within the scope of a logical datapath.
1021 </p>
1022 </column>
88058f19 1023
c96ba502
BP
1024 <column name="mac">
1025 <p>
1026 The Ethernet address or addresses used as a source address on the
1027 logical port, each in the form
1028 <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
1029 The string <code>unknown</code> is also allowed to indicate that the
1030 logical port has an unknown set of (additional) source addresses.
1031 </p>
1032
1033 <p>
1034 A VM interface would ordinarily have a single Ethernet address. A
1035 gateway port might initially only have <code>unknown</code>, and then
1036 add MAC addresses to the set as it learns new source addresses.
1037 </p>
1038 </column>
88058f19 1039
c96ba502
BP
1040 <column name="type">
1041 <p>
1042 A type for this logical port. Logical ports can be used to model other
1043 types of connectivity into an OVN logical switch. The following types
1044 are defined:
1045 </p>
1046
1047 <dl>
1048 <dt>(empty string)</dt>
1049 <dd>VM (or VIF) interface.</dd>
1050 <dt><code>localnet</code></dt>
1051 <dd>
1052 A connection to a locally accessible network from each
1053 <code>ovn-controller</code> instance. A logical switch can only
1054 have a single <code>localnet</code> port attached and at most one
1055 regular logical port. This is used to model direct connectivity to
1056 an existing network.
1057 </dd>
1058
1059 <dt><code>vtep</code></dt>
1060 <dd>
1061 A port to a logical switch on a VTEP gateway chassis. In order to
1062 get this port correctly recognized by the OVN controller, the <ref
1063 column="options"
1064 table="Port_Binding"/>:<code>vtep-physical-switch</code> and <ref
1065 column="options"
1066 table="Port_Binding"/>:<code>vtep-logical-switch</code> must also
1067 be defined.
1068 </dd>
1069 </dl>
1070 </column>
1071 </group>
1a76c93e 1072
c96ba502 1073 <group title="Localnet Options">
eb00399e 1074 <p>
c96ba502
BP
1075 These options apply to logical ports with <ref column="type"/> of
1076 <code>localnet</code>.
eb00399e
BP
1077 </p>
1078
c96ba502
BP
1079 <column name="options" key="network_name">
1080 Required. <code>ovn-controller</code> uses the configuration entry
1081 <code>ovn-bridge-mappings</code> to determine how to connect to this
1082 network. <code>ovn-bridge-mappings</code> is a list of network names
1083 mapped to a local OVS bridge that provides access to that network. An
1084 example of configuring <code>ovn-bridge-mappings</code> would be:
1085
1086 <pre>$ ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1</pre>
1087
1088 <p>
1089 When a logical switch has a <code>localnet</code> port attached,
1090 every chassis that may have a local vif attached to that logical
1091 switch must have a bridge mapping configured to reach that
1092 <code>localnet</code>. Traffic that arrives on a
1093 <code>localnet</code> port is never forwarded over a tunnel to
1094 another chassis.
1095 </p>
1096 </column>
1097
1098 <column name="tag">
1099 If set, indicates that the port represents a connection to a specific
1100 VLAN on a locally accessible network. The VLAN ID is used to match
1101 incoming traffic and is also added to outgoing traffic.
1102 </column>
1103 </group>
1104
1105 <group title="VTEP Options">
eb00399e 1106 <p>
c96ba502
BP
1107 These options apply to logical ports with <ref column="type"/> of
1108 <code>vtep</code>.
eb00399e 1109 </p>
9fb4636f 1110
c96ba502
BP
1111 <column name="options" key="vtep-physical-switch">
1112 Required. The name of the VTEP gateway.
1113 </column>
fe36184b 1114
c96ba502
BP
1115 <column name="options" key="vtep-logical-switch">
1116 Required. A logical switch name connected by the VTEP gateway. Must
1117 be set when <ref column="type"/> is <code>vtep</code>.
1118 </column>
1119 </group>
fe36184b 1120
c96ba502 1121 <group title="Nested Containers">
fe36184b 1122 <p>
c96ba502
BP
1123 These columns support containers nested within a VM. Specifically,
1124 they are used when <ref column="type"/> is empty and <ref
1125 column="logical_port"/> identifies the interface of a container spawned
1126 inside a VM. They are empty for containers or VMs that run directly on
1127 a hypervisor.
fe36184b
BP
1128 </p>
1129
c96ba502
BP
1130 <column name="parent_port">
1131 This is taken from
1132 <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
1133 in the OVN_Northbound database's <ref table="Logical_Port"
1134 db="OVN_Northbound"/> table.
1135 </column>
1136
1137 <column name="tag">
1138 <p>
1139 Identifies the VLAN tag in the network traffic associated with that
1140 container's network interface.
1141 </p>
1142
1143 <p>
1144 This column is used for a different purpose when <ref column="type"/>
1145 is <code>localnet</code> (see <code>Localnet Options</code>, above).
1146 </p>
1147 </column>
1148 </group>
fe36184b
BP
1149 </table>
1150</database>