]>
Commit | Line | Data |
---|---|---|
1af530bc JP |
1 | <?xml version="1.0" encoding="utf-8"?> |
2 | <manpage program="ovn-northd" section="8" title="ovn-northd"> | |
3 | <h1>Name</h1> | |
4 | <p>ovn-northd -- Open Virtual Network central control daemon</p> | |
5 | ||
6 | <h1>Synopsis</h1> | |
7 | <p><code>ovn-northd</code> [<var>options</var>]</p> | |
8 | ||
9 | <h1>Description</h1> | |
10 | <p> | |
11 | <code>ovn-northd</code> is a centralized daemon responsible for | |
12 | translating the high-level OVN configuration into logical | |
13 | configuration consumable by daemons such as | |
14 | <code>ovn-controller</code>. It translates the logical network | |
15 | configuration in terms of conventional network concepts, taken | |
16 | from the OVN Northbound Database (see <code>ovn-nb</code>(5)), | |
17 | into logical datapath flows in the OVN Southbound Database (see | |
18 | <code>ovn-sb</code>(5)) below it. | |
19 | </p> | |
20 | ||
10381044 LR |
21 | <h1>Options</h1> |
22 | <dl> | |
23 | <dt><code>--ovnnb-db=<var>database</var></code></dt> | |
24 | <dd> | |
25 | The OVSDB database containing the OVN Northbound Database. If the | |
26 | <env>OVN_NB_DB</env> environment variable is set, its value is used | |
27 | as the default. Otherwise, the default is | |
28 | <code>unix:@RUNDIR@/ovnnb_db.sock</code>. | |
29 | </dd> | |
30 | <dt><code>--ovnsb-db=<var>database</var></code></dt> | |
31 | <dd> | |
32 | The OVSDB database containing the OVN Southbound Database. If the | |
33 | <env>OVN_SB_DB</env> environment variable is set, its value is used | |
34 | as the default. Otherwise, the default is | |
35 | <code>unix:@RUNDIR@/ovnsb_db.sock</code>. | |
36 | </dd> | |
37 | </dl> | |
1af530bc | 38 | <p> |
10381044 LR |
39 | <var>database</var> in the above options must take one of the following |
40 | forms: | |
1af530bc | 41 | </p> |
10381044 LR |
42 | <xi:include href="ovsdb/remote-active.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/> |
43 | <xi:include href="ovsdb/remote-passive.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/> | |
44 | ||
45 | <h2>Daemon Options</h2> | |
46 | <xi:include href="lib/daemon.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/> | |
47 | ||
48 | <h2>Logging Options</h2> | |
49 | <xi:include href="lib/vlog.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/> | |
50 | ||
51 | <h2>PKI Options</h2> | |
1af530bc | 52 | <p> |
10381044 LR |
53 | PKI configuration is required in order to use SSL for the connections to |
54 | the Northbound and Southbound databases. | |
1af530bc | 55 | </p> |
10381044 LR |
56 | <xi:include href="lib/ssl.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/> |
57 | ||
58 | <h2>Other Options</h2> | |
59 | ||
60 | <xi:include href="lib/common.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/> | |
1af530bc | 61 | |
322ec639 | 62 | <h1>Runtime Management Commands</h1> |
1af530bc JP |
63 | <p> |
64 | <code>ovs-appctl</code> can send commands to a running | |
65 | <code>ovn-northd</code> process. The currently supported commands | |
66 | are described below. | |
67 | <dl> | |
68 | <dt><code>exit</code></dt> | |
69 | <dd> | |
70 | Causes <code>ovn-northd</code> to gracefully terminate. | |
71 | </dd> | |
72 | </dl> | |
73 | </p> | |
74 | ||
5cff6b99 BP |
75 | <h1>Logical Flow Table Structure</h1> |
76 | ||
77 | <p> | |
78 | One of the main purposes of <code>ovn-northd</code> is to populate the | |
79 | <code>Logical_Flow</code> table in the <code>OVN_Southbound</code> | |
80 | database. This section describes how <code>ovn-northd</code> does this | |
9975d7be | 81 | for switch and router logical datapaths. |
5cff6b99 BP |
82 | </p> |
83 | ||
9975d7be BP |
84 | <h2>Logical Switch Datapaths</h2> |
85 | ||
685f4dfe | 86 | <h3>Ingress Table 0: Admission Control and Ingress Port Security - L2</h3> |
5cff6b99 BP |
87 | |
88 | <p> | |
89 | Ingress table 0 contains these logical flows: | |
90 | </p> | |
91 | ||
92 | <ul> | |
93 | <li> | |
94 | Priority 100 flows to drop packets with VLAN tags or multicast Ethernet | |
95 | source addresses. | |
96 | </li> | |
97 | ||
98 | <li> | |
99 | Priority 50 flows that implement ingress port security for each enabled | |
100 | logical port. For logical ports on which port security is enabled, | |
101 | these match the <code>inport</code> and the valid <code>eth.src</code> | |
102 | address(es) and advance only those packets to the next flow table. For | |
103 | logical ports on which port security is not enabled, these advance all | |
104 | packets that match the <code>inport</code>. | |
105 | </li> | |
106 | </ul> | |
107 | ||
108 | <p> | |
109 | There are no flows for disabled logical ports because the default-drop | |
110 | behavior of logical flow tables causes packets that ingress from them to | |
111 | be dropped. | |
112 | </p> | |
113 | ||
685f4dfe | 114 | <h3>Ingress Table 1: Ingress Port Security - IP</h3> |
78aab811 JP |
115 | |
116 | <p> | |
685f4dfe NS |
117 | Ingress table 1 contains these logical flows: |
118 | </p> | |
119 | ||
120 | <ul> | |
121 | <li> | |
122 | <p> | |
123 | For each element in the port security set having one or more IPv4 or | |
124 | IPv6 addresses (or both), | |
125 | </p> | |
126 | ||
127 | <ul> | |
128 | <li> | |
129 | Priority 90 flow to allow IPv4 traffic if it has IPv4 addresses | |
130 | which match the <code>inport</code>, valid <code>eth.src</code> | |
131 | and valid <code>ip4.src</code> address(es). | |
132 | </li> | |
133 | ||
9e687b23 DL |
134 | <li> |
135 | Priority 90 flow to allow IPv4 DHCP discovery traffic if it has a | |
136 | valid <code>eth.src</code>. This is necessary since DHCP discovery | |
137 | messages are sent from the unspecified IPv4 address (0.0.0.0) since | |
138 | the IPv4 address has not yet been assigned. | |
139 | </li> | |
140 | ||
685f4dfe NS |
141 | <li> |
142 | Priority 90 flow to allow IPv6 traffic if it has IPv6 addresses | |
143 | which match the <code>inport</code>, valid <code>eth.src</code> and | |
144 | valid <code>ip6.src</code> address(es). | |
145 | </li> | |
146 | ||
9e687b23 DL |
147 | <li> |
148 | Priority 90 flow to allow IPv6 DAD (Duplicate Address Detection) | |
149 | traffic if it has a valid <code>eth.src</code>. This is is | |
150 | necessary since DAD include requires joining an multicast group and | |
151 | sending neighbor solicitations for the newly assigned address. Since | |
152 | no address is yet assigned, these are sent from the unspecified | |
153 | IPv6 address (::). | |
154 | </li> | |
155 | ||
685f4dfe NS |
156 | <li> |
157 | Priority 80 flow to drop IP (both IPv4 and IPv6) traffic which | |
158 | match the <code>inport</code> and valid <code>eth.src</code>. | |
159 | </li> | |
160 | </ul> | |
161 | </li> | |
162 | ||
163 | <li> | |
164 | One priority-0 fallback flow that matches all packets and advances to | |
2c36d5a6 | 165 | the next table. |
685f4dfe NS |
166 | </li> |
167 | </ul> | |
168 | ||
169 | <h3>Ingress Table 2: Ingress Port Security - Neighbor discovery</h3> | |
170 | ||
171 | <p> | |
172 | Ingress table 2 contains these logical flows: | |
173 | </p> | |
174 | ||
175 | <ul> | |
176 | <li> | |
177 | <p> | |
178 | For each element in the port security set, | |
179 | </p> | |
180 | ||
181 | <ul> | |
182 | <li> | |
183 | Priority 90 flow to allow ARP traffic which match the | |
184 | <code>inport</code> and valid <code>eth.src</code> and | |
185 | <code>arp.sha</code>. If the element has one or more | |
186 | IPv4 addresses, then it also matches the valid | |
187 | <code>arp.spa</code>. | |
188 | </li> | |
189 | ||
190 | <li> | |
191 | Priority 90 flow to allow IPv6 Neighbor Solicitation and | |
192 | Advertisement traffic which match the <code>inport</code>, | |
193 | valid <code>eth.src</code> and | |
194 | <code>nd.sll</code>/<code>nd.tll</code>. | |
195 | If the element has one or more IPv6 addresses, then it also | |
196 | matches the valid <code>nd.target</code> address(es) for Neighbor | |
197 | Advertisement traffic. | |
198 | </li> | |
199 | ||
200 | <li> | |
201 | Priority 80 flow to drop ARP and IPv6 Neighbor Solicitation and | |
202 | Advertisement traffic which match the <code>inport</code> and | |
203 | valid <code>eth.src</code>. | |
204 | </li> | |
205 | </ul> | |
206 | </li> | |
207 | ||
208 | <li> | |
209 | One priority-0 fallback flow that matches all packets and advances to | |
2c36d5a6 | 210 | the next table. |
685f4dfe NS |
211 | </li> |
212 | </ul> | |
213 | ||
214 | <h3>Ingress Table 3: <code>from-lport</code> Pre-ACLs</h3> | |
215 | ||
216 | <p> | |
2c36d5a6 GS |
217 | This table prepares flows for possible stateful ACL processing in |
218 | ingress table <code>ACLs</code>. It contains a priority-0 flow that | |
219 | simply moves traffic to the next table. If stateful ACLs are used in the | |
facf8652 GS |
220 | logical datapath, a priority-100 flow is added that sets a hint |
221 | (with <code>reg0[0] = 1; next;</code>) for table | |
222 | <code>Pre-stateful</code> to send IP packets to the connection tracker | |
223 | before eventually advancing to ingress table <code>ACLs</code>. | |
78aab811 JP |
224 | </p> |
225 | ||
7a15be69 GS |
226 | <h3>Ingress Table 4: Pre-LB</h3> |
227 | ||
228 | <p> | |
229 | This table prepares flows for possible stateful load balancing processing | |
230 | in ingress table <code>LB</code> and <code>Stateful</code>. It contains | |
231 | a priority-0 flow that simply moves traffic to the next table. If load | |
232 | balancing rules with virtual IP addresses (and ports) are configured in | |
cc4583aa | 233 | <code>OVN_Northbound</code> database for a logical switch datapath, a |
7a15be69 GS |
234 | priority-100 flow is added for each configured virtual IP address |
235 | <var>VIP</var> with a match <code>ip && ip4.dst == <var>VIP</var> | |
236 | </code> that sets an action <code>reg0[0] = 1; next;</code> to act as a | |
237 | hint for table <code>Pre-stateful</code> to send IP packets to the | |
238 | connection tracker for packet de-fragmentation before eventually | |
239 | advancing to ingress table <code>LB</code>. | |
240 | </p> | |
241 | ||
242 | <h3>Ingress Table 5: Pre-stateful</h3> | |
facf8652 GS |
243 | |
244 | <p> | |
245 | This table prepares flows for all possible stateful processing | |
246 | in next tables. It contains a priority-0 flow that simply moves | |
247 | traffic to the next table. A priority-100 flow sends the packets to | |
248 | connection tracker based on a hint provided by the previous tables | |
249 | (with a match for <code>reg0[0] == 1</code>) by using the | |
250 | <code>ct_next;</code> action. | |
251 | </p> | |
252 | ||
7a15be69 | 253 | <h3>Ingress table 6: <code>from-lport</code> ACLs</h3> |
5cff6b99 BP |
254 | |
255 | <p> | |
256 | Logical flows in this table closely reproduce those in the | |
78aab811 | 257 | <code>ACL</code> table in the <code>OVN_Northbound</code> database |
cc58e1f2 RB |
258 | for the <code>from-lport</code> direction. The <code>priority</code> |
259 | values from the <code>ACL</code> table have a limited range and have | |
260 | 1000 added to them to leave room for OVN default flows at both | |
261 | higher and lower priorities. | |
5cff6b99 | 262 | </p> |
cc58e1f2 RB |
263 | <ul> |
264 | <li> | |
265 | <code>allow</code> ACLs translate into logical flows with | |
266 | the <code>next;</code> action. If there are any stateful ACLs | |
267 | on this datapath, then <code>allow</code> ACLs translate to | |
268 | <code>ct_commit; next;</code> (which acts as a hint for the next tables | |
269 | to commit the connection to conntrack), | |
270 | </li> | |
271 | <li> | |
272 | <code>allow-related</code> ACLs translate into logical | |
273 | flows with the <code>ct_commit(ct_label=0/1); next;</code> actions | |
274 | for new connections and <code>reg0[1] = 1; next;</code> for existing | |
275 | connections. | |
276 | </li> | |
277 | <li> | |
278 | Other ACLs translate to <code>drop;</code> for new or untracked | |
279 | connections and <code>ct_commit(ct_label=1/1);</code> for known | |
280 | connections. Setting <code>ct_label</code> marks a connection | |
281 | as one that was previously allowed, but should no longer be | |
282 | allowed due to a policy change. | |
283 | </li> | |
284 | </ul> | |
5cff6b99 BP |
285 | |
286 | <p> | |
2c36d5a6 | 287 | This table also contains a priority 0 flow with action |
78aab811 JP |
288 | <code>next;</code>, so that ACLs allow packets by default. If the |
289 | logical datapath has a statetful ACL, the following flows will | |
290 | also be added: | |
5cff6b99 BP |
291 | </p> |
292 | ||
78aab811 JP |
293 | <ul> |
294 | <li> | |
fa313a8c GS |
295 | A priority-1 flow that sets the hint to commit IP traffic to the |
296 | connection tracker (with action <code>reg0[1] = 1; next;</code>). This | |
297 | is needed for the default allow policy because, while the initiator's | |
298 | direction may not have any stateful rules, the server's may and then | |
299 | its return traffic would not be known and marked as invalid. | |
78aab811 JP |
300 | </li> |
301 | ||
302 | <li> | |
cc58e1f2 RB |
303 | A priority-65535 flow that allows any traffic in the reply |
304 | direction for a connection that has been committed to the | |
305 | connection tracker (i.e., established flows), as long as | |
b73db61d | 306 | the committed flow does not have <code>ct_label.blocked</code> set. |
cc58e1f2 RB |
307 | We only handle traffic in the reply direction here because |
308 | we want all packets going in the request direction to still | |
309 | go through the flows that implement the currently defined | |
310 | policy based on ACLs. If a connection is no longer allowed by | |
b73db61d | 311 | policy, <code>ct_label.blocked</code> will get set and packets in the |
cc58e1f2 | 312 | reply direction will no longer be allowed, either. |
78aab811 JP |
313 | </li> |
314 | ||
315 | <li> | |
316 | A priority-65535 flow that allows any traffic that is considered | |
317 | related to a committed flow in the connection tracker (e.g., an | |
cc58e1f2 | 318 | ICMP Port Unreachable from a non-listening UDP port), as long |
b73db61d | 319 | as the committed flow does not have <code>ct_label.blocked</code> set. |
78aab811 JP |
320 | </li> |
321 | ||
322 | <li> | |
323 | A priority-65535 flow that drops all traffic marked by the | |
324 | connection tracker as invalid. | |
325 | </li> | |
cc58e1f2 RB |
326 | |
327 | <li> | |
328 | A priority-65535 flow that drops all trafic in the reply direction | |
b73db61d | 329 | with <code>ct_label.blocked</code> set meaning that the connection |
cc58e1f2 RB |
330 | should no longer be allowed due to a policy change. Packets |
331 | in the request direction are skipped here to let a newly created | |
332 | ACL re-allow this connection. | |
333 | </li> | |
78aab811 JP |
334 | </ul> |
335 | ||
1a03fc7d BS |
336 | <h3>Ingress Table 7: <code>from-lport</code> QoS marking</h3> |
337 | ||
338 | <p> | |
339 | Logical flows in this table closely reproduce those in the | |
340 | <code>QoS</code> table in the <code>OVN_Northbound</code> database | |
341 | for the <code>from-lport</code> direction. | |
342 | </p> | |
343 | ||
344 | <ul> | |
345 | <li> | |
346 | For every qos_rules for every logical switch a flow will be added at | |
347 | priorities mentioned in the QoS table. | |
348 | </li> | |
349 | ||
350 | <li> | |
351 | One priority-0 fallback flow that matches all packets and advances to | |
352 | the next table. | |
353 | </li> | |
354 | </ul> | |
355 | ||
356 | <h3>Ingress Table 8: LB</h3> | |
fa313a8c GS |
357 | |
358 | <p> | |
359 | It contains a priority-0 flow that simply moves traffic to the next | |
7a15be69 GS |
360 | table. For established connections a priority 100 flow matches on |
361 | <code>ct.est && !ct.rel && !ct.new && | |
362 | !ct.inv</code> and sets an action <code>reg0[2] = 1; next;</code> to act | |
363 | as a hint for table <code>Stateful</code> to send packets through | |
364 | connection tracker to NAT the packets. (The packet will automatically | |
365 | get DNATed to the same IP address as the first packet in that | |
366 | connection.) | |
fa313a8c GS |
367 | </p> |
368 | ||
1a03fc7d | 369 | <h3>Ingress Table 9: Stateful</h3> |
7a15be69 GS |
370 | |
371 | <ul> | |
372 | <li> | |
cc4583aa | 373 | For all the configured load balancing rules for a switch in |
7a15be69 GS |
374 | <code>OVN_Northbound</code> database that includes a L4 port |
375 | <var>PORT</var> of protocol <var>P</var> and IPv4 address | |
376 | <var>VIP</var>, a priority-120 flow that matches on | |
377 | <code>ct.new && ip && ip4.dst == <var>VIP | |
378 | </var>&& <var>P</var> && <var>P</var>.dst == <var>PORT | |
379 | </var></code> with an action of <code>ct_lb(<var>args</var>)</code>, | |
380 | where <var>args</var> contains comma separated IPv4 addresses (and | |
381 | optional port numbers) to load balance to. | |
382 | </li> | |
383 | <li> | |
cc4583aa | 384 | For all the configured load balancing rules for a switch in |
7a15be69 GS |
385 | <code>OVN_Northbound</code> database that includes just an IP address |
386 | <var>VIP</var> to match on, a priority-110 flow that matches on | |
387 | <code>ct.new && ip && ip4.dst == <var>VIP</var></code> | |
388 | with an action of <code>ct_lb(<var>args</var>)</code>, where | |
389 | <var>args</var> contains comma separated IPv4 addresses. | |
390 | </li> | |
391 | <li> | |
392 | A priority-100 flow commits packets to connection tracker using | |
393 | <code>ct_commit; next;</code> action based on a hint provided by | |
394 | the previous tables (with a match for <code>reg0[1] == 1</code>). | |
395 | </li> | |
396 | <li> | |
397 | A priority-100 flow sends the packets to connection tracker using | |
398 | <code>ct_lb;</code> as the action based on a hint provided by the | |
399 | previous tables (with a match for <code>reg0[2] == 1</code>). | |
400 | </li> | |
401 | <li> | |
402 | A priority-0 flow that simply moves traffic to the next table. | |
403 | </li> | |
404 | </ul> | |
405 | ||
1a03fc7d | 406 | <h3>Ingress Table 10: ARP/ND responder</h3> |
5cff6b99 BP |
407 | |
408 | <p> | |
22ab299e DB |
409 | This table implements ARP/ND responder in a logical switch for known |
410 | IPs. The advantage of the ARP responder flow is to limit ARP | |
411 | broadcasts by locally responding to ARP requests without the need to | |
412 | send to other hypervisors. One common case is when the inport is a | |
413 | logical port associated with a VIF and the broadcast is responded to | |
414 | on the local hypervisor rather than broadcast across the whole | |
415 | network and responded to by the destination VM. This behavior is | |
416 | proxy ARP. | |
5cff6b99 BP |
417 | </p> |
418 | ||
22ab299e DB |
419 | <p> |
420 | ARP requests arrive from VMs from a logical switch inport of type | |
421 | default. For this case, the logical switch proxy ARP rules can be | |
422 | for other VMs or logical router ports. Logical switch proxy ARP | |
423 | rules may be programmed both for mac binding of IP addresses on | |
424 | other logical switch VIF ports (which are of the default logical | |
425 | switch port type, representing connectivity to VMs or containers), | |
426 | and for mac binding of IP addresses on logical switch router type | |
427 | ports, representing their logical router port peers. In order to | |
428 | support proxy ARP for logical router ports, an IP address must be | |
429 | configured on the logical switch router type port, with the same | |
430 | value as the peer logical router port. The configured MAC addresses | |
431 | must match as well. When a VM sends an ARP request for a distributed | |
432 | logical router port and if the peer router type port of the attached | |
433 | logical switch does not have an IP address configured, the ARP request | |
434 | will be broadcast on the logical switch. One of the copies of the ARP | |
435 | request will go through the logical switch router type port to the | |
436 | logical router datapath, where the logical router ARP responder will | |
437 | generate a reply. The MAC binding of a distributed logical router, | |
438 | once learned by an associated VM, is used for all that VM's | |
439 | communication needing routing. Hence, the action of a VM re-arping for | |
440 | the mac binding of the logical router port should be rare. | |
441 | </p> | |
442 | ||
443 | <p> | |
444 | Logical switch ARP responder proxy ARP rules can also be hit when | |
445 | receiving ARP requests externally on a L2 gateway port. In this case, | |
446 | the hypervisor acting as an L2 gateway, responds to the ARP request on | |
447 | behalf of a destination VM. | |
448 | </p> | |
449 | ||
450 | <p> | |
451 | Note that ARP requests received from <code>localnet</code> or | |
452 | <code>vtep</code> logical inports can either go directly to VMs, in | |
453 | which case the VM responds or can hit an ARP responder for a logical | |
454 | router port if the packet is used to resolve a logical router port | |
455 | next hop address. In either case, logical switch ARP responder rules | |
456 | will not be hit. It contains these logical flows: | |
457 | </p> | |
458 | ||
5cff6b99 | 459 | <ul> |
fa128126 | 460 | <li> |
22ab299e DB |
461 | Priority-100 flows to skip the ARP responder if inport is of type |
462 | <code>localnet</code> or <code>vtep</code> and advances directly | |
463 | to the next table. ARP requests sent to <code>localnet</code> or | |
464 | <code>vtep</code> ports can be received by multiple hypervisors. | |
465 | Now, because the same mac binding rules are downloaded to all | |
466 | hypervisors, each of the multiple hypervisors will respond. This | |
467 | will confuse L2 learning on the source of the ARP requests. ARP | |
468 | requests received on an inport of type <code>router</code> are not | |
469 | expected to hit any logical switch ARP responder flows. However, | |
470 | no skip flows are installed for these packets, as there would be | |
471 | some additional flow cost for this and the value appears limited. | |
fa128126 HZ |
472 | </li> |
473 | ||
57d143eb | 474 | <li> |
4c7bf534 | 475 | <p> |
6fdb7cd6 | 476 | Priority-50 flows that match ARP requests to each known IP address |
22ab299e | 477 | <var>A</var> of every logical switch port, and respond with ARP |
4c7bf534 NS |
478 | replies directly with corresponding Ethernet address <var>E</var>: |
479 | </p> | |
480 | ||
57d143eb HZ |
481 | <pre> |
482 | eth.dst = eth.src; | |
483 | eth.src = <var>E</var>; | |
484 | arp.op = 2; /* ARP reply. */ | |
485 | arp.tha = arp.sha; | |
486 | arp.sha = <var>E</var>; | |
487 | arp.tpa = arp.spa; | |
488 | arp.spa = <var>A</var>; | |
6fdb7cd6 | 489 | outport = inport; |
bf143492 | 490 | flags.loopback = 1; |
57d143eb HZ |
491 | output; |
492 | </pre> | |
4c7bf534 NS |
493 | |
494 | <p> | |
495 | These flows are omitted for logical ports (other than router ports) | |
496 | that are down. | |
497 | </p> | |
57d143eb HZ |
498 | </li> |
499 | ||
6fdb7cd6 JP |
500 | <li> |
501 | <p> | |
502 | Priority-50 flows that match IPv6 ND neighbor solicitations to | |
503 | each known IP address <var>A</var> (and <var>A</var>'s | |
22ab299e | 504 | solicited node address) of every logical switch port, and |
6fdb7cd6 JP |
505 | respond with neighbor advertisements directly with |
506 | corresponding Ethernet address <var>E</var>: | |
507 | </p> | |
508 | ||
509 | <pre> | |
510 | nd_na { | |
511 | eth.src = <var>E</var>; | |
512 | ip6.src = <var>A</var>; | |
513 | nd.target = <var>A</var>; | |
514 | nd.tll = <var>E</var>; | |
515 | outport = inport; | |
bf143492 | 516 | flags.loopback = 1; |
6fdb7cd6 JP |
517 | output; |
518 | }; | |
519 | </pre> | |
520 | ||
521 | <p> | |
522 | These flows are omitted for logical ports (other than router ports) | |
523 | that are down. | |
524 | </p> | |
525 | </li> | |
526 | ||
9fcb6a18 BP |
527 | <li> |
528 | <p> | |
529 | Priority-100 flows with match criteria like the ARP and ND flows | |
530 | above, except that they only match packets from the | |
531 | <code>inport</code> that owns the IP addresses in question, with | |
532 | action <code>next;</code>. These flows prevent OVN from replying to, | |
533 | for example, an ARP request emitted by a VM for its own IP address. | |
534 | A VM only makes this kind of request to attempt to detect a duplicate | |
535 | IP address assignment, so sending a reply will prevent the VM from | |
536 | accepting the IP address that it owns. | |
537 | </p> | |
538 | ||
539 | <p> | |
540 | In place of <code>next;</code>, it would be reasonable to use | |
541 | <code>drop;</code> for the flows' actions. If everything is working | |
542 | as it is configured, then this would produce equivalent results, | |
543 | since no host should reply to the request. But ARPing for one's own | |
544 | IP address is intended to detect situations where the network is not | |
545 | working as configured, so dropping the request would frustrate that | |
546 | intent. | |
547 | </p> | |
548 | </li> | |
549 | ||
fa128126 HZ |
550 | <li> |
551 | One priority-0 fallback flow that matches all packets and advances to | |
2c36d5a6 | 552 | the next table. |
fa128126 HZ |
553 | </li> |
554 | </ul> | |
555 | ||
1a03fc7d | 556 | <h3>Ingress Table 11: DHCP option processing</h3> |
281977f7 NS |
557 | |
558 | <p> | |
559 | This table adds the DHCPv4 options to a DHCPv4 packet from the | |
33ac3c83 NS |
560 | logical ports configured with IPv4 address(es) and DHCPv4 options, |
561 | and similarly for DHCPv6 options. | |
281977f7 NS |
562 | </p> |
563 | ||
564 | <ul> | |
565 | <li> | |
566 | <p> | |
567 | A priority-100 logical flow is added for these logical ports | |
568 | which matches the IPv4 packet with <code>udp.src</code> = 68 and | |
569 | <code>udp.dst</code> = 67 and applies the action | |
570 | <code>put_dhcp_opts</code> and advances the packet to the next table. | |
571 | </p> | |
572 | ||
573 | <pre> | |
33ac3c83 | 574 | reg0[3] = put_dhcp_opts(offer_ip = <var>ip</var>, <var>options</var>...); |
281977f7 NS |
575 | next; |
576 | </pre> | |
577 | ||
578 | <p> | |
579 | For DHCPDISCOVER and DHCPREQUEST, this transforms the packet into a | |
33ac3c83 | 580 | DHCP reply, adds the DHCP offer IP <var>ip</var> and options to the |
281977f7 NS |
581 | packet, and stores 1 into reg0[3]. For other kinds of packets, it |
582 | just stores 0 into reg0[3]. Either way, it continues to the next | |
583 | table. | |
584 | </p> | |
585 | ||
586 | </li> | |
587 | ||
33ac3c83 NS |
588 | <li> |
589 | <p> | |
590 | A priority-100 logical flow is added for these logical ports | |
591 | which matches the IPv6 packet with <code>udp.src</code> = 546 and | |
592 | <code>udp.dst</code> = 547 and applies the action | |
593 | <code>put_dhcpv6_opts</code> and advances the packet to the next | |
594 | table. | |
595 | </p> | |
596 | ||
597 | <pre> | |
598 | reg0[3] = put_dhcpv6_opts(ia_addr = <var>ip</var>, <var>options</var>...); | |
599 | next; | |
600 | </pre> | |
601 | ||
602 | <p> | |
603 | For DHCPv6 Solicit/Request/Confirm packets, this transforms the | |
604 | packet into a DHCPv6 Advertise/Reply, adds the DHCPv6 offer IP | |
605 | <var>ip</var> and options to the packet, and stores 1 into reg0[3]. | |
606 | For other kinds of packets, it just stores 0 into reg0[3]. Either | |
607 | way, it continues to the next table. | |
608 | </p> | |
609 | </li> | |
610 | ||
281977f7 NS |
611 | <li> |
612 | A priority-0 flow that matches all packets to advances to table 11. | |
613 | </li> | |
614 | </ul> | |
615 | ||
1a03fc7d | 616 | <h3>Ingress Table 12: DHCP responses</h3> |
281977f7 NS |
617 | |
618 | <p> | |
619 | This table implements DHCP responder for the DHCP replies generated by | |
620 | the previous table. | |
621 | </p> | |
622 | ||
623 | <ul> | |
624 | <li> | |
625 | <p> | |
626 | A priority 100 logical flow is added for the logical ports configured | |
627 | with DHCPv4 options which matches IPv4 packets with <code>udp.src == 68 | |
628 | && udp.dst == 67 && reg0[3] == 1</code> and | |
629 | responds back to the <code>inport</code> after applying these | |
630 | actions. If <code>reg0[3]</code> is set to 1, it means that the | |
631 | action <code>put_dhcp_opts</code> was successful. | |
632 | </p> | |
633 | ||
634 | <pre> | |
635 | eth.dst = eth.src; | |
636 | eth.src = <var>E</var>; | |
33ac3c83 | 637 | ip4.dst = <var>A</var>; |
281977f7 NS |
638 | ip4.src = <var>S</var>; |
639 | udp.src = 67; | |
640 | udp.dst = 68; | |
641 | outport = <var>P</var>; | |
bf143492 | 642 | flags.loopback = 1; |
281977f7 NS |
643 | output; |
644 | </pre> | |
645 | ||
646 | <p> | |
647 | where <var>E</var> is the server MAC address and <var>S</var> is the | |
33ac3c83 | 648 | server IPv4 address defined in the DHCPv4 options and <var>A</var> is |
281977f7 NS |
649 | the IPv4 address defined in the logical port's addresses column. |
650 | </p> | |
651 | ||
652 | <p> | |
653 | (This terminates ingress packet processing; the packet does not go | |
654 | to the next ingress table.) | |
655 | </p> | |
656 | </li> | |
657 | ||
33ac3c83 NS |
658 | <li> |
659 | <p> | |
660 | A priority 100 logical flow is added for the logical ports configured | |
661 | with DHCPv6 options which matches IPv6 packets with <code>udp.src == 546 | |
662 | && udp.dst == 547 && reg0[3] == 1</code> and | |
663 | responds back to the <code>inport</code> after applying these | |
664 | actions. If <code>reg0[3]</code> is set to 1, it means that the | |
665 | action <code>put_dhcpv6_opts</code> was successful. | |
666 | </p> | |
667 | ||
668 | <pre> | |
669 | eth.dst = eth.src; | |
670 | eth.src = <var>E</var>; | |
671 | ip6.dst = <var>A</var>; | |
672 | ip6.src = <var>S</var>; | |
673 | udp.src = 547; | |
674 | udp.dst = 546; | |
675 | outport = <var>P</var>; | |
676 | flags.loopback = 1; | |
677 | output; | |
678 | </pre> | |
679 | ||
680 | <p> | |
681 | where <var>E</var> is the server MAC address and <var>S</var> is the | |
682 | server IPv6 LLA address generated from the <code>server_id</code> | |
683 | defined in the DHCPv6 options and <var>A</var> is | |
684 | the IPv6 address defined in the logical port's addresses column. | |
685 | </p> | |
686 | ||
687 | <p> | |
688 | (This terminates packet processing; the packet does not go on the | |
689 | next ingress table.) | |
690 | </p> | |
691 | </li> | |
692 | ||
281977f7 NS |
693 | <li> |
694 | A priority-0 flow that matches all packets to advances to table 12. | |
695 | </li> | |
696 | </ul> | |
697 | ||
302eda27 NS |
698 | <h3>Ingress Table 13 DNS Lookup</h3> |
699 | ||
700 | <p> | |
701 | This table looks up and resolves the DNS names to the corresponding | |
702 | configured IP address(es). | |
703 | </p> | |
704 | ||
705 | <ul> | |
706 | <li> | |
707 | <p> | |
708 | A priority-100 logical flow for each logical switch datapath | |
709 | if it is configured with DNS records, which matches the IPv4 and IPv6 | |
710 | packets with <code>udp.dst</code> = 53 and applies the action | |
711 | <code>dns_lookup</code> and advances the packet to the next table. | |
712 | </p> | |
713 | ||
714 | <pre> | |
715 | reg0[4] = dns_lookup(); next; | |
716 | </pre> | |
717 | ||
718 | <p> | |
719 | For valid DNS packets, this transforms the packet into a DNS | |
720 | reply if the DNS name can be resolved, and stores 1 into reg0[4]. | |
721 | For failed DNS resolution or other kinds of packets, it just stores | |
722 | 0 into reg0[4]. Either way, it continues to the next table. | |
723 | </p> | |
724 | </li> | |
725 | </ul> | |
726 | ||
727 | <h3>Ingress Table 14 DNS Responses</h3> | |
728 | ||
729 | <p> | |
730 | This table implements DNS responder for the DNS replies generated by | |
731 | the previous table. | |
732 | </p> | |
733 | ||
734 | <ul> | |
735 | <li> | |
736 | <p> | |
737 | A priority-100 logical flow for each logical switch datapath | |
738 | if it is configured with DNS records, which matches the IPv4 and IPv6 | |
739 | packets with <code>udp.dst = 53 && reg0[4] == 1</code> | |
740 | and responds back to the <code>inport</code> after applying these | |
741 | actions. If <code>reg0[4]</code> is set to 1, it means that the | |
742 | action <code>dns_lookup</code> was successful. | |
743 | </p> | |
744 | ||
745 | <pre> | |
746 | eth.dst <-> eth.src; | |
747 | ip4.src <-> ip4.dst; | |
748 | udp.dst = udp.src; | |
749 | udp.src = 53; | |
750 | outport = <var>P</var>; | |
751 | flags.loopback = 1; | |
752 | output; | |
753 | </pre> | |
754 | ||
755 | <p> | |
756 | (This terminates ingress packet processing; the packet does not go | |
757 | to the next ingress table.) | |
758 | </p> | |
759 | </li> | |
760 | </ul> | |
761 | ||
762 | <h3>Ingress Table 15 Destination Lookup</h3> | |
fa128126 HZ |
763 | |
764 | <p> | |
765 | This table implements switching behavior. It contains these logical | |
766 | flows: | |
767 | </p> | |
768 | ||
769 | <ul> | |
5cff6b99 BP |
770 | <li> |
771 | A priority-100 flow that outputs all packets with an Ethernet broadcast | |
772 | or multicast <code>eth.dst</code> to the <code>MC_FLOOD</code> | |
773 | multicast group, which <code>ovn-northd</code> populates with all | |
774 | enabled logical ports. | |
775 | </li> | |
776 | ||
777 | <li> | |
41a15b71 MS |
778 | <p> |
779 | One priority-50 flow that matches each known Ethernet address against | |
780 | <code>eth.dst</code> and outputs the packet to the single associated | |
781 | output port. | |
782 | </p> | |
783 | ||
784 | <p> | |
785 | For the Ethernet address on a logical switch port of type | |
786 | <code>router</code>, when that logical switch port's | |
787 | <ref column="addresses" table="Logical_Switch_Port" | |
788 | db="OVN_Northbound"/> column is set to <code>router</code> and | |
789 | the connected logical router port specifies a | |
06a26dd2 | 790 | <code>redirect-chassis</code>: |
41a15b71 | 791 | </p> |
06a26dd2 MS |
792 | |
793 | <ul> | |
794 | <li> | |
795 | The flow for the connected logical router port's Ethernet | |
796 | address is only programmed on the <code>redirect-chassis</code>. | |
797 | </li> | |
798 | ||
799 | <li> | |
800 | If the logical router has rules specified in | |
801 | <ref column="nat" table="Logical_Router" db="OVN_Northbound"/> with | |
802 | <ref column="external_mac" table="NAT" db="OVN_Northbound"/>, then | |
803 | those addresses are also used to populate the switch's destination | |
804 | lookup on the chassis where | |
805 | <ref column="logical_port" table="NAT" db="OVN_Northbound"/> is | |
806 | resident. | |
807 | </li> | |
808 | </ul> | |
5cff6b99 BP |
809 | </li> |
810 | ||
811 | <li> | |
812 | One priority-0 fallback flow that matches all packets and outputs them | |
813 | to the <code>MC_UNKNOWN</code> multicast group, which | |
814 | <code>ovn-northd</code> populates with all enabled logical ports that | |
815 | accept unknown destination packets. As a small optimization, if no | |
816 | logical ports accept unknown destination packets, | |
817 | <code>ovn-northd</code> omits this multicast group and logical flow. | |
818 | </li> | |
819 | </ul> | |
820 | ||
7a15be69 GS |
821 | <h3>Egress Table 0: Pre-LB</h3> |
822 | ||
823 | <p> | |
824 | This table is similar to ingress table <code>Pre-LB</code>. It | |
825 | contains a priority-0 flow that simply moves traffic to the next table. | |
826 | If any load balancing rules exist for the datapath, a priority-100 flow | |
827 | is added with a match of <code>ip</code> and action of <code>reg0[0] = 1; | |
828 | next;</code> to act as a hint for table <code>Pre-stateful</code> to | |
829 | send IP packets to the connection tracker for packet de-fragmentation. | |
830 | </p> | |
831 | ||
832 | <h3>Egress Table 1: <code>to-lport</code> Pre-ACLs</h3> | |
78aab811 JP |
833 | |
834 | <p> | |
2c36d5a6 GS |
835 | This is similar to ingress table <code>Pre-ACLs</code> except for |
836 | <code>to-lport</code> traffic. | |
78aab811 JP |
837 | </p> |
838 | ||
7a15be69 | 839 | <h3>Egress Table 2: Pre-stateful</h3> |
facf8652 GS |
840 | |
841 | <p> | |
842 | This is similar to ingress table <code>Pre-stateful</code>. | |
843 | </p> | |
844 | ||
7a15be69 GS |
845 | <h3>Egress Table 3: LB</h3> |
846 | <p> | |
847 | This is similar to ingress table <code>LB</code>. | |
848 | </p> | |
849 | ||
850 | <h3>Egress Table 4: <code>to-lport</code> ACLs</h3> | |
5cff6b99 BP |
851 | |
852 | <p> | |
2c36d5a6 GS |
853 | This is similar to ingress table <code>ACLs</code> except for |
854 | <code>to-lport</code> ACLs. | |
685f4dfe NS |
855 | </p> |
856 | ||
1a03fc7d BS |
857 | <h3>Egress Table 5: <code>to-lport</code> QoS marking</h3> |
858 | ||
859 | <p> | |
860 | This is similar to ingress table <code>QoS marking</code> except for | |
861 | <code>to-lport</code> qos rules. | |
862 | </p> | |
863 | ||
864 | <h3>Egress Table 6: Stateful</h3> | |
fa313a8c GS |
865 | |
866 | <p> | |
7a15be69 GS |
867 | This is similar to ingress table <code>Stateful</code> except that |
868 | there are no rules added for load balancing new connections. | |
fa313a8c GS |
869 | </p> |
870 | ||
281977f7 | 871 | <p> |
302eda27 | 872 | Also the following flows are added. |
281977f7 | 873 | </p> |
302eda27 NS |
874 | <ul> |
875 | <li> | |
876 | A priority 34000 logical flow is added for each logical port which | |
877 | has DHCPv4 options defined to allow the DHCPv4 reply packet and which has | |
878 | DHCPv6 options defined to allow the DHCPv6 reply packet from the | |
879 | <code>Ingress Table 12: DHCP responses</code>. | |
880 | </li> | |
881 | ||
882 | <li> | |
883 | A priority 34000 logical flow is added for each logical switch datapath | |
884 | configured with DNS records with the match <code>udp.dst = 53</code> | |
885 | to allow the DNS reply packet from the | |
886 | <code>Ingress Table 14:DNS responses</code>. | |
887 | </li> | |
888 | </ul> | |
281977f7 | 889 | |
1a03fc7d | 890 | <h3>Egress Table 7: Egress Port Security - IP</h3> |
685f4dfe NS |
891 | |
892 | <p> | |
2c36d5a6 GS |
893 | This is similar to the port security logic in table |
894 | <code>Ingress Port Security - IP</code> except that <code>outport</code>, | |
895 | <code>eth.dst</code>, <code>ip4.dst</code> and <code>ip6.dst</code> | |
896 | are checked instead of <code>inport</code>, <code>eth.src</code>, | |
897 | <code>ip4.src</code> and <code>ip6.src</code> | |
5cff6b99 BP |
898 | </p> |
899 | ||
1a03fc7d | 900 | <h3>Egress Table 8: Egress Port Security - L2</h3> |
5cff6b99 BP |
901 | |
902 | <p> | |
2c36d5a6 GS |
903 | This is similar to the ingress port security logic in ingress table |
904 | <code>Admission Control and Ingress Port Security - L2</code>, | |
5cff6b99 BP |
905 | but with important differences. Most obviously, <code>outport</code> and |
906 | <code>eth.dst</code> are checked instead of <code>inport</code> and | |
907 | <code>eth.src</code>. Second, packets directed to broadcast or multicast | |
908 | <code>eth.dst</code> are always accepted instead of being subject to the | |
909 | port security rules; this is implemented through a priority-100 flow that | |
9975d7be | 910 | matches on <code>eth.mcast</code> with action <code>output;</code>. |
5cff6b99 BP |
911 | Finally, to ensure that even broadcast and multicast packets are not |
912 | delivered to disabled logical ports, a priority-150 flow for each | |
913 | disabled logical <code>outport</code> overrides the priority-100 flow | |
914 | with a <code>drop;</code> action. | |
915 | </p> | |
9975d7be BP |
916 | |
917 | <h2>Logical Router Datapaths</h2> | |
918 | ||
5412db30 J |
919 | <p> |
920 | Logical router datapaths will only exist for <ref table="Logical_Router" | |
921 | db="OVN_Northbound"/> rows in the <ref db="OVN_Northbound"/> database | |
922 | that do not have <ref column="enabled" table="Logical_Router" | |
923 | db="OVN_Northbound"/> set to <code>false</code> | |
924 | </p> | |
925 | ||
9975d7be BP |
926 | <h3>Ingress Table 0: L2 Admission Control</h3> |
927 | ||
928 | <p> | |
929 | This table drops packets that the router shouldn't see at all based on | |
930 | their Ethernet headers. It contains the following flows: | |
931 | </p> | |
932 | ||
933 | <ul> | |
934 | <li> | |
935 | Priority-100 flows to drop packets with VLAN tags or multicast Ethernet | |
936 | source addresses. | |
937 | </li> | |
938 | ||
939 | <li> | |
41a15b71 MS |
940 | <p> |
941 | For each enabled router port <var>P</var> with Ethernet address | |
942 | <var>E</var>, a priority-50 flow that matches <code>inport == | |
943 | <var>P</var> && (eth.mcast || eth.dst == | |
944 | <var>E</var></code>), with action <code>next;</code>. | |
945 | </p> | |
946 | ||
947 | <p> | |
948 | For the gateway port on a distributed logical router (where | |
949 | one of the logical router ports specifies a | |
950 | <code>redirect-chassis</code>), the above flow matching | |
951 | <code>eth.dst == <var>E</var></code> is only programmed on | |
952 | the gateway port instance on the | |
953 | <code>redirect-chassis</code>. | |
954 | </p> | |
9975d7be | 955 | </li> |
06a26dd2 MS |
956 | |
957 | <li> | |
958 | <p> | |
959 | For each <code>dnat_and_snat</code> NAT rule on a distributed | |
960 | router that specifies an external Ethernet address <var>E</var>, | |
961 | a priority-50 flow that matches <code>inport == <var>GW</var> | |
962 | && eth.dst == <var>E</var></code>, where <var>GW</var> | |
963 | is the logical router gateway port, with action | |
964 | <code>next;</code>. | |
965 | </p> | |
966 | ||
967 | <p> | |
968 | This flow is only programmed on the gateway port instance on | |
969 | the chassis where the <code>logical_port</code> specified in | |
970 | the NAT rule resides. | |
971 | </p> | |
972 | </li> | |
9975d7be BP |
973 | </ul> |
974 | ||
975 | <p> | |
976 | Other packets are implicitly dropped. | |
977 | </p> | |
978 | ||
979 | <h3>Ingress Table 1: IP Input</h3> | |
980 | ||
981 | <p> | |
982 | This table is the core of the logical router datapath functionality. It | |
983 | contains the following flows to implement very basic IP host | |
984 | functionality. | |
985 | </p> | |
986 | ||
987 | <ul> | |
988 | <li> | |
989 | <p> | |
990 | L3 admission control: A priority-100 flow drops packets that match | |
991 | any of the following: | |
992 | </p> | |
993 | ||
994 | <ul> | |
995 | <li> | |
996 | <code>ip4.src[28..31] == 0xe</code> (multicast source) | |
997 | </li> | |
998 | <li> | |
999 | <code>ip4.src == 255.255.255.255</code> (broadcast source) | |
1000 | </li> | |
1001 | <li> | |
1002 | <code>ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8</code> | |
1003 | (localhost source or destination) | |
1004 | </li> | |
1005 | <li> | |
1006 | <code>ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8</code> (zero | |
1007 | network source or destination) | |
1008 | </li> | |
1009 | <li> | |
6fdb7cd6 | 1010 | <code>ip4.src</code> or <code>ip6.src</code> is any IP |
06a26dd2 MS |
1011 | address owned by the router, unless the packet was recirculated |
1012 | due to egress loopback as indicated by | |
1013 | <code>REGBIT_EGRESS_LOOPBACK</code>. | |
9975d7be BP |
1014 | </li> |
1015 | <li> | |
1016 | <code>ip4.src</code> is the broadcast address of any IP network | |
1017 | known to the router. | |
1018 | </li> | |
1019 | </ul> | |
1020 | </li> | |
1021 | ||
1022 | <li> | |
1023 | <p> | |
1024 | ICMP echo reply. These flows reply to ICMP echo requests received | |
e9bc5de1 | 1025 | for the router's IP address. Let <var>A</var> be an IP address |
6fdb7cd6 JP |
1026 | owned by a router port. Then, for each <var>A</var> that is |
1027 | an IPv4 address, a priority-90 flow matches on | |
1028 | <code>ip4.dst == <var>A</var></code> and | |
1029 | <code>icmp4.type == 8 && icmp4.code == 0</code> | |
1030 | (ICMP echo request). For each <var>A</var> that is an IPv6 | |
1031 | address, a priority-90 flow matches on | |
1032 | <code>ip6.dst == <var>A</var></code> and | |
1033 | <code>icmp6.type == 128 && icmp6.code == 0</code> | |
1034 | (ICMPv6 echo request). The port of the router that receives the | |
1035 | echo request does not matter. Also, the <code>ip.ttl</code> of | |
1036 | the echo request packet is not checked, so it complies with | |
1037 | RFC 1812, section 4.2.2.9. Flows for ICMPv4 echo requests use the | |
1038 | following actions: | |
9975d7be BP |
1039 | </p> |
1040 | ||
1041 | <pre> | |
4685e523 | 1042 | ip4.dst <-> ip4.src; |
47f3b59b | 1043 | ip.ttl = 255; |
9975d7be | 1044 | icmp4.type = 0; |
bf143492 | 1045 | flags.loopback = 1; |
6fdb7cd6 JP |
1046 | next; |
1047 | </pre> | |
1048 | ||
1049 | <p> | |
1050 | Flows for ICMPv6 echo requests use the following actions: | |
1051 | </p> | |
1052 | ||
1053 | <pre> | |
1054 | ip6.dst <-> ip6.src; | |
1055 | ip.ttl = 255; | |
1056 | icmp6.type = 129; | |
bf143492 | 1057 | flags.loopback = 1; |
9975d7be BP |
1058 | next; |
1059 | </pre> | |
9975d7be BP |
1060 | </li> |
1061 | ||
1062 | <li> | |
1063 | <p> | |
de297547 GS |
1064 | Reply to ARP requests. |
1065 | </p> | |
1066 | ||
1067 | <p> | |
1068 | These flows reply to ARP requests for the router's own IP address. | |
1069 | For each router port <var>P</var> that owns IP address <var>A</var> | |
1070 | and Ethernet address <var>E</var>, a priority-90 flow matches | |
1071 | <code>inport == <var>P</var> && arp.op == 1 && | |
1072 | arp.tpa == <var>A</var></code> (ARP request) with the following | |
1073 | actions: | |
1074 | </p> | |
1075 | ||
1076 | <pre> | |
1077 | eth.dst = eth.src; | |
1078 | eth.src = <var>E</var>; | |
1079 | arp.op = 2; /* ARP reply. */ | |
1080 | arp.tha = arp.sha; | |
1081 | arp.sha = <var>E</var>; | |
1082 | arp.tpa = arp.spa; | |
1083 | arp.spa = <var>A</var>; | |
1084 | outport = <var>P</var>; | |
bf143492 | 1085 | flags.loopback = 1; |
de297547 GS |
1086 | output; |
1087 | </pre> | |
41a15b71 MS |
1088 | |
1089 | <p> | |
1090 | For the gateway port on a distributed logical router (where | |
1091 | one of the logical router ports specifies a | |
1092 | <code>redirect-chassis</code>), the above flows are only | |
1093 | programmed on the gateway port instance on the | |
1094 | <code>redirect-chassis</code>. This behavior avoids generation | |
1095 | of multiple ARP responses from different chassis, and allows | |
1096 | upstream MAC learning to point to the | |
1097 | <code>redirect-chassis</code>. | |
1098 | </p> | |
de297547 GS |
1099 | </li> |
1100 | ||
1101 | <li> | |
1102 | <p> | |
1103 | These flows reply to ARP requests for the virtual IP addresses | |
cc4583aa GS |
1104 | configured in the router for DNAT or load balancing. For a |
1105 | configured DNAT IP address or a load balancer VIP <var>A</var>, | |
1106 | for each router port <var>P</var> with Ethernet | |
de297547 GS |
1107 | address <var>E</var>, a priority-90 flow matches |
1108 | <code>inport == <var>P</var> && arp.op == 1 && | |
1109 | arp.tpa == <var>A</var></code> (ARP request) | |
0bac7164 | 1110 | with the following actions: |
9975d7be BP |
1111 | </p> |
1112 | ||
1113 | <pre> | |
1114 | eth.dst = eth.src; | |
1115 | eth.src = <var>E</var>; | |
1116 | arp.op = 2; /* ARP reply. */ | |
1117 | arp.tha = arp.sha; | |
1118 | arp.sha = <var>E</var>; | |
1119 | arp.tpa = arp.spa; | |
1120 | arp.spa = <var>A</var>; | |
1121 | outport = <var>P</var>; | |
bf143492 | 1122 | flags.loopback = 1; |
9975d7be BP |
1123 | output; |
1124 | </pre> | |
06a26dd2 MS |
1125 | |
1126 | <p> | |
1127 | For the gateway port on a distributed logical router with NAT | |
1128 | (where one of the logical router ports specifies a | |
1129 | <code>redirect-chassis</code>): | |
1130 | </p> | |
1131 | ||
1132 | <ul> | |
1133 | <li> | |
1134 | If the corresponding NAT rule cannot be handled in a | |
1135 | distributed manner, then this flow is only programmed on | |
1136 | the gateway port instance on the | |
1137 | <code>redirect-chassis</code>. This behavior avoids | |
1138 | generation of multiple ARP responses from different chassis, | |
1139 | and allows upstream MAC learning to point to the | |
1140 | <code>redirect-chassis</code>. | |
1141 | </li> | |
1142 | ||
1143 | <li> | |
1144 | <p> | |
1145 | If the corresponding NAT rule can be handled in a distributed | |
1146 | manner, then this flow is only programmed on the gateway port | |
1147 | instance where the <code>logical_port</code> specified in the | |
1148 | NAT rule resides. | |
1149 | </p> | |
1150 | ||
1151 | <p> | |
1152 | Some of the actions are different for this case, using the | |
1153 | <code>external_mac</code> specified in the NAT rule rather | |
1154 | than the gateway port's Ethernet address <var>E</var>: | |
1155 | </p> | |
1156 | ||
1157 | <pre> | |
1158 | eth.src = <var>external_mac</var>; | |
1159 | arp.sha = <var>external_mac</var>; | |
1160 | </pre> | |
1161 | ||
1162 | <p> | |
1163 | This behavior avoids generation of multiple ARP responses | |
1164 | from different chassis, and allows upstream MAC learning to | |
1165 | point to the correct chassis. | |
1166 | </p> | |
1167 | </li> | |
1168 | </ul> | |
9975d7be BP |
1169 | </li> |
1170 | ||
0bac7164 | 1171 | <li> |
c34a87b6 | 1172 | ARP reply handling. This flow uses ARP replies to populate the |
0bac7164 BP |
1173 | logical router's ARP table. A priority-90 flow with match <code>arp.op |
1174 | == 2</code> has actions <code>put_arp(inport, arp.spa, | |
1175 | arp.sha);</code>. | |
1176 | </li> | |
1177 | ||
6fdb7cd6 JP |
1178 | <li> |
1179 | <p> | |
c34a87b6 JP |
1180 | Reply to IPv6 Neighbor Solicitations. These flows reply to |
1181 | Neighbor Solicitation requests for the router's own IPv6 | |
1182 | address and populate the logical router's mac binding table. | |
1183 | For each router port <var>P</var> that owns IPv6 address | |
1184 | <var>A</var>, solicited node address <var>S</var>, and | |
1185 | Ethernet address <var>E</var>, a priority-90 flow matches | |
1186 | <code>inport == <var>P</var> && nd_ns && | |
1187 | ip6.dst == {<var>A</var>, <var>E</var>} && nd.target | |
1188 | == <var>A</var></code> with the following actions: | |
6fdb7cd6 JP |
1189 | </p> |
1190 | ||
1191 | <pre> | |
c34a87b6 | 1192 | put_nd(inport, ip6.src, nd.sll); |
6fdb7cd6 JP |
1193 | nd_na { |
1194 | eth.src = <var>E</var>; | |
1195 | ip6.src = <var>A</var>; | |
1196 | nd.target = <var>A</var>; | |
1197 | nd.tll = <var>E</var>; | |
1198 | outport = inport; | |
bf143492 | 1199 | flags.loopback = 1; |
6fdb7cd6 JP |
1200 | output; |
1201 | }; | |
1202 | </pre> | |
41a15b71 MS |
1203 | |
1204 | <p> | |
1205 | For the gateway port on a distributed logical router (where | |
1206 | one of the logical router ports specifies a | |
1207 | <code>redirect-chassis</code>), the above flows replying to | |
1208 | IPv6 Neighbor Solicitations are only programmed on the | |
1209 | gateway port instance on the <code>redirect-chassis</code>. | |
1210 | This behavior avoids generation of multiple replies from | |
1211 | different chassis, and allows upstream MAC learning to point | |
1212 | to the <code>redirect-chassis</code>. | |
1213 | </p> | |
6fdb7cd6 JP |
1214 | </li> |
1215 | ||
c34a87b6 JP |
1216 | <li> |
1217 | IPv6 neighbor advertisement handling. This flow uses neighbor | |
1218 | advertisements to populate the logical router's mac binding | |
1219 | table. A priority-90 flow with match <code>nd_na</code> | |
1220 | has actions <code>put_nd(inport, nd.target, nd.tll);</code>. | |
1221 | </li> | |
1222 | ||
1223 | <li> | |
1224 | IPv6 neighbor solicitation for non-hosted addresses handling. | |
1225 | This flow uses neighbor solicitations to populate the logical | |
1226 | router's mac binding table (ones that were directed at the | |
1227 | logical router would have matched the priority-90 neighbor | |
1228 | solicitation flow already). A priority-80 flow with match | |
1229 | <code>nd_ns</code> has actions | |
1230 | <code>put_nd(inport, ip6.src, nd.sll);</code>. | |
1231 | </li> | |
1232 | ||
9975d7be BP |
1233 | <li> |
1234 | <p> | |
1235 | UDP port unreachable. Priority-80 flows generate ICMP port | |
1236 | unreachable messages in reply to UDP datagrams directed to the | |
1237 | router's IP address. The logical router doesn't accept any UDP | |
1238 | traffic so it always generates such a reply. | |
1239 | </p> | |
1240 | ||
1241 | <p> | |
1242 | These flows should not match IP fragments with nonzero offset. | |
1243 | </p> | |
1244 | ||
1245 | <p> | |
1246 | Details TBD. Not yet implemented. | |
1247 | </p> | |
1248 | </li> | |
1249 | ||
1250 | <li> | |
1251 | <p> | |
1252 | TCP reset. Priority-80 flows generate TCP reset messages in reply to | |
1253 | TCP datagrams directed to the router's IP address. The logical | |
1254 | router doesn't accept any TCP traffic so it always generates such a | |
1255 | reply. | |
1256 | </p> | |
1257 | ||
1258 | <p> | |
1259 | These flows should not match IP fragments with nonzero offset. | |
1260 | </p> | |
1261 | ||
1262 | <p> | |
1263 | Details TBD. Not yet implemented. | |
1264 | </p> | |
1265 | </li> | |
1266 | ||
1267 | <li> | |
1268 | <p> | |
1269 | Protocol unreachable. Priority-70 flows generate ICMP protocol | |
1270 | unreachable messages in reply to packets directed to the router's IP | |
1271 | address on IP protocols other than UDP, TCP, and ICMP. | |
1272 | </p> | |
1273 | ||
1274 | <p> | |
1275 | These flows should not match IP fragments with nonzero offset. | |
1276 | </p> | |
1277 | ||
1278 | <p> | |
1279 | Details TBD. Not yet implemented. | |
1280 | </p> | |
1281 | </li> | |
1282 | ||
1283 | <li> | |
1284 | Drop other IP traffic to this router. These flows drop any other | |
1285 | traffic destined to an IP address of this router that is not already | |
1286 | handled by one of the flows above, which amounts to ICMP (other than | |
1287 | echo requests) and fragments with nonzero offsets. For each IP address | |
1288 | <var>A</var> owned by the router, a priority-60 flow matches | |
4ef48e9d CSV |
1289 | <code>ip4.dst == <var>A</var></code> and drops the traffic. An |
1290 | exception is made and the above flow is not added if the router | |
1291 | port's own IP address is used to SNAT packets passing through that | |
1292 | router. | |
9975d7be BP |
1293 | </li> |
1294 | </ul> | |
1295 | ||
1296 | <p> | |
1297 | The flows above handle all of the traffic that might be directed to the | |
1298 | router itself. The following flows (with lower priorities) handle the | |
1299 | remaining traffic, potentially for forwarding: | |
1300 | </p> | |
1301 | ||
1302 | <ul> | |
1303 | <li> | |
1304 | Drop Ethernet local broadcast. A priority-50 flow with match | |
1305 | <code>eth.bcast</code> drops traffic destined to the local Ethernet | |
1306 | broadcast address. By definition this traffic should not be forwarded. | |
1307 | </li> | |
1308 | ||
9975d7be BP |
1309 | <li> |
1310 | <p> | |
1311 | ICMP time exceeded. For each router port <var>P</var>, whose IP | |
1312 | address is <var>A</var>, a priority-40 flow with match <code>inport | |
47f3b59b | 1313 | == <var>P</var> && ip.ttl == {0, 1} && |
9975d7be BP |
1314 | !ip.later_frag</code> matches packets whose TTL has expired, with the |
1315 | following actions to send an ICMP time exceeded reply: | |
1316 | </p> | |
1317 | ||
1318 | <pre> | |
1319 | icmp4 { | |
1320 | icmp4.type = 11; /* Time exceeded. */ | |
1321 | icmp4.code = 0; /* TTL exceeded in transit. */ | |
1322 | ip4.dst = ip4.src; | |
1323 | ip4.src = <var>A</var>; | |
47f3b59b | 1324 | ip.ttl = 255; |
9975d7be BP |
1325 | next; |
1326 | }; | |
1327 | </pre> | |
1328 | ||
1329 | <p> | |
1330 | Not yet implemented. | |
1331 | </p> | |
1332 | </li> | |
1333 | ||
1334 | <li> | |
47f3b59b | 1335 | TTL discard. A priority-30 flow with match <code>ip.ttl == {0, |
9975d7be BP |
1336 | 1}</code> and actions <code>drop;</code> drops other packets whose TTL |
1337 | has expired, that should not receive a ICMP error reply (i.e. fragments | |
1338 | with nonzero offset). | |
1339 | </li> | |
1340 | ||
1341 | <li> | |
1342 | Next table. A priority-0 flows match all packets that aren't already | |
cc4583aa GS |
1343 | handled and uses actions <code>next;</code> to feed them to the next |
1344 | table. | |
9975d7be BP |
1345 | </li> |
1346 | </ul> | |
1347 | ||
cc4583aa GS |
1348 | <h3>Ingress Table 2: DEFRAG</h3> |
1349 | ||
1350 | <p> | |
1351 | This is to send packets to connection tracker for tracking and | |
1352 | defragmentation. It contains a priority-0 flow that simply moves traffic | |
1353 | to the next table. If load balancing rules with virtual IP addresses | |
1354 | (and ports) are configured in <code>OVN_Northbound</code> database for a | |
1355 | Gateway router, a priority-100 flow is added for each configured virtual | |
1356 | IP address <var>VIP</var> with a match <code>ip && | |
1357 | ip4.dst == <var>VIP</var></code> that sets an action | |
1358 | <code>ct_next;</code> to send IP packets to the connection tracker for | |
1359 | packet de-fragmentation and tracking before sending it to the next table. | |
1360 | </p> | |
1361 | ||
1362 | <h3>Ingress Table 3: UNSNAT</h3> | |
de297547 GS |
1363 | |
1364 | <p> | |
1365 | This is for already established connections' reverse traffic. | |
1366 | i.e., SNAT has already been done in egress pipeline and now the | |
1367 | packet has entered the ingress pipeline as part of a reply. It is | |
1368 | unSNATted here. | |
1369 | </p> | |
1370 | ||
06a26dd2 MS |
1371 | <p>Ingress Table 3: UNSNAT on Gateway Routers</p> |
1372 | ||
de297547 GS |
1373 | <ul> |
1374 | <li> | |
1375 | <p> | |
65d8810c GS |
1376 | If the Gateway router has been configured to force SNAT any |
1377 | previously DNATted packets to <var>B</var>, a priority-110 flow | |
1378 | matches <code>ip && ip4.dst == <var>B</var></code> with | |
1379 | an action <code>ct_snat; next;</code>. | |
1380 | </p> | |
1381 | ||
1382 | <p> | |
1383 | If the Gateway router has been configured to force SNAT any | |
1384 | previously load-balanced packets to <var>B</var>, a priority-100 flow | |
1385 | matches <code>ip && ip4.dst == <var>B</var></code> with | |
1386 | an action <code>ct_snat; next;</code>. | |
1387 | </p> | |
1388 | ||
1389 | <p> | |
1390 | For each NAT configuration in the OVN Northbound database, that asks | |
de297547 | 1391 | to change the source IP address of a packet from <var>A</var> to |
65d8810c | 1392 | <var>B</var>, a priority-90 flow matches <code>ip && |
de297547 GS |
1393 | ip4.dst == <var>B</var></code> with an action |
1394 | <code>ct_snat; next;</code>. | |
1395 | </p> | |
1396 | ||
1397 | <p> | |
1398 | A priority-0 logical flow with match <code>1</code> has actions | |
1399 | <code>next;</code>. | |
1400 | </p> | |
1401 | </li> | |
1402 | </ul> | |
1403 | ||
06a26dd2 MS |
1404 | <p>Ingress Table 3: UNSNAT on Distributed Routers</p> |
1405 | ||
1406 | <ul> | |
1407 | <li> | |
1408 | <p> | |
1409 | For each configuration in the OVN Northbound database, that asks | |
1410 | to change the source IP address of a packet from <var>A</var> to | |
1411 | <var>B</var>, a priority-100 flow matches <code>ip && | |
1412 | ip4.dst == <var>B</var> && inport == <var>GW</var></code>, | |
1413 | where <var>GW</var> is the logical router gateway port, with an | |
1414 | action <code>ct_snat; next;</code>. | |
1415 | </p> | |
1416 | ||
1417 | <p> | |
1418 | If the NAT rule cannot be handled in a distributed manner, then | |
1419 | the priority-100 flow above is only programmed on the | |
1420 | <code>redirect-chassis</code>. | |
1421 | </p> | |
1422 | ||
1423 | <p> | |
1424 | For each configuration in the OVN Northbound database, that asks | |
1425 | to change the source IP address of a packet from <var>A</var> to | |
1426 | <var>B</var>, a priority-50 flow matches <code>ip && | |
1427 | ip4.dst == <var>B</var></code> with an action | |
1428 | <code>REGBIT_NAT_REDIRECT = 1; next;</code>. This flow is for | |
1429 | east/west traffic to a NAT destination IPv4 address. By | |
1430 | setting the <code>REGBIT_NAT_REDIRECT</code> flag, in the | |
1431 | ingress table <code>Gateway Redirect</code> this will trigger a | |
1432 | redirect to the instance of the gateway port on the | |
1433 | <code>redirect-chassis</code>. | |
1434 | </p> | |
1435 | ||
1436 | <p> | |
1437 | A priority-0 logical flow with match <code>1</code> has actions | |
1438 | <code>next;</code>. | |
1439 | </p> | |
1440 | </li> | |
1441 | </ul> | |
1442 | ||
cc4583aa | 1443 | <h3>Ingress Table 4: DNAT</h3> |
de297547 GS |
1444 | |
1445 | <p> | |
1446 | Packets enter the pipeline with destination IP address that needs to | |
1447 | be DNATted from a virtual IP address to a real IP address. Packets | |
1448 | in the reverse direction needs to be unDNATed. | |
1449 | </p> | |
06a26dd2 MS |
1450 | |
1451 | <p>Ingress Table 4: DNAT on Gateway Routers</p> | |
1452 | ||
de297547 GS |
1453 | <ul> |
1454 | <li> | |
cc4583aa GS |
1455 | For all the configured load balancing rules for Gateway router in |
1456 | <code>OVN_Northbound</code> database that includes a L4 port | |
1457 | <var>PORT</var> of protocol <var>P</var> and IPv4 address | |
1458 | <var>VIP</var>, a priority-120 flow that matches on | |
1459 | <code>ct.new && ip && ip4.dst == <var>VIP</var> | |
1460 | && <var>P</var> && <var>P</var>.dst == <var>PORT | |
1461 | </var></code> with an action of <code>ct_lb(<var>args</var>)</code>, | |
1462 | where <var>args</var> contains comma separated IPv4 addresses (and | |
65d8810c GS |
1463 | optional port numbers) to load balance to. If the Gateway router |
1464 | is configured to force SNAT any load-balanced packets, the above | |
1465 | action will be replaced by <code>flags.force_snat_for_lb = 1; | |
1466 | ct_lb(<var>args</var>);</code>. | |
1467 | </li> | |
1468 | ||
1469 | <li> | |
1470 | For all the configured load balancing rules for Gateway router in | |
1471 | <code>OVN_Northbound</code> database that includes a L4 port | |
1472 | <var>PORT</var> of protocol <var>P</var> and IPv4 address | |
1473 | <var>VIP</var>, a priority-120 flow that matches on | |
1474 | <code>ct.est && ip && ip4.dst == <var>VIP</var> | |
1475 | && <var>P</var> && <var>P</var>.dst == <var>PORT | |
1476 | </var></code> with an action of <code>ct_dnat;</code>. | |
1477 | If the Gateway router is configured to force SNAT any load-balanced | |
1478 | packets, the above action will be replaced by | |
1479 | <code>flags.force_snat_for_lb = 1; ct_dnat;</code>. | |
cc4583aa | 1480 | </li> |
de297547 | 1481 | |
cc4583aa GS |
1482 | <li> |
1483 | For all the configured load balancing rules for Gateway router in | |
1484 | <code>OVN_Northbound</code> database that includes just an IP address | |
1485 | <var>VIP</var> to match on, a priority-110 flow that matches on | |
1486 | <code>ct.new && ip && ip4.dst == | |
1487 | <var>VIP</var></code> with an action of | |
1488 | <code>ct_lb(<var>args</var>)</code>, where <var>args</var> contains | |
65d8810c GS |
1489 | comma separated IPv4 addresses. If the Gateway router |
1490 | is configured to force SNAT any load-balanced packets, the above | |
1491 | action will be replaced by <code>flags.force_snat_for_lb = 1; | |
1492 | ct_lb(<var>args</var>);</code>. | |
1493 | </li> | |
1494 | ||
1495 | <li> | |
1496 | For all the configured load balancing rules for Gateway router in | |
1497 | <code>OVN_Northbound</code> database that includes just an IP address | |
1498 | <var>VIP</var> to match on, a priority-110 flow that matches on | |
1499 | <code>ct.est && ip && ip4.dst == | |
1500 | <var>VIP</var></code> with an action of <code>ct_dnat;</code>. | |
1501 | If the Gateway router is configured to force SNAT any load-balanced | |
1502 | packets, the above action will be replaced by | |
1503 | <code>flags.force_snat_for_lb = 1; ct_dnat;</code>. | |
cc4583aa | 1504 | </li> |
de297547 | 1505 | |
cc4583aa GS |
1506 | <li> |
1507 | For each configuration in the OVN Northbound database, that asks | |
1508 | to change the destination IP address of a packet from <var>A</var> to | |
1509 | <var>B</var>, a priority-100 flow matches <code>ip && | |
1510 | ip4.dst == <var>A</var></code> with an action | |
65d8810c GS |
1511 | <code>flags.loopback = 1; ct_dnat(<var>B</var>);</code>. If the |
1512 | Gateway router is configured to force SNAT any DNATed packet, | |
1513 | the above action will be replaced by | |
1514 | <code>flags.force_snat_for_dnat = 1; flags.loopback = 1; | |
1515 | ct_dnat(<var>B</var>);</code>. | |
cc4583aa GS |
1516 | </li> |
1517 | ||
1518 | <li> | |
1519 | For all IP packets of a Gateway router, a priority-50 flow with an | |
1520 | action <code>flags.loopback = 1; ct_dnat;</code>. | |
1521 | </li> | |
1522 | ||
1523 | <li> | |
1524 | A priority-0 logical flow with match <code>1</code> has actions | |
1525 | <code>next;</code>. | |
de297547 GS |
1526 | </li> |
1527 | </ul> | |
1528 | ||
06a26dd2 MS |
1529 | <p>Ingress Table 4: DNAT on Distributed Routers</p> |
1530 | ||
1531 | <p> | |
1532 | On distributed routers, the DNAT table only handles packets | |
1533 | with destination IP address that needs to be DNATted from a | |
1534 | virtual IP address to a real IP address. The unDNAT processing | |
1535 | in the reverse direction is handled in a separate table in the | |
1536 | egress pipeline. | |
1537 | </p> | |
1538 | ||
1539 | <ul> | |
1540 | <li> | |
1541 | <p> | |
1542 | For each configuration in the OVN Northbound database, that asks | |
1543 | to change the destination IP address of a packet from <var>A</var> to | |
1544 | <var>B</var>, a priority-100 flow matches <code>ip && | |
1545 | ip4.dst == <var>B</var> && inport == <var>GW</var></code>, | |
1546 | where <var>GW</var> is the logical router gateway port, with an | |
1547 | action <code>ct_dnat(<var>B</var>);</code>. | |
1548 | </p> | |
1549 | ||
1550 | <p> | |
1551 | If the NAT rule cannot be handled in a distributed manner, then | |
1552 | the priority-100 flow above is only programmed on the | |
1553 | <code>redirect-chassis</code>. | |
1554 | </p> | |
1555 | ||
1556 | <p> | |
1557 | For each configuration in the OVN Northbound database, that asks | |
1558 | to change the destination IP address of a packet from <var>A</var> to | |
1559 | <var>B</var>, a priority-50 flow matches <code>ip && | |
1560 | ip4.dst == <var>B</var></code> with an action | |
1561 | <code>REGBIT_NAT_REDIRECT = 1; next;</code>. This flow is for | |
1562 | east/west traffic to a NAT destination IPv4 address. By | |
1563 | setting the <code>REGBIT_NAT_REDIRECT</code> flag, in the | |
1564 | ingress table <code>Gateway Redirect</code> this will trigger a | |
1565 | redirect to the instance of the gateway port on the | |
1566 | <code>redirect-chassis</code>. | |
1567 | </p> | |
1568 | ||
1569 | <p> | |
1570 | A priority-0 logical flow with match <code>1</code> has actions | |
1571 | <code>next;</code>. | |
1572 | </p> | |
1573 | </li> | |
1574 | </ul> | |
1575 | ||
cc4583aa | 1576 | <h3>Ingress Table 5: IP Routing</h3> |
9975d7be BP |
1577 | |
1578 | <p> | |
6fdb7cd6 JP |
1579 | A packet that arrives at this table is an IP packet that should be |
1580 | routed to the address in <code>ip4.dst</code> or | |
1581 | <code>ip6.dst</code>. This table implements IP routing, setting | |
1582 | <code>reg0</code> (or <code>xxreg0</code> for IPv6) to the next-hop IP | |
1583 | address (leaving <code>ip4.dst</code> or <code>ip6.dst</code>, the | |
1584 | packet's final destination, unchanged) and advances to the next | |
1585 | table for ARP resolution. It also sets <code>reg1</code> (or | |
47021598 | 1586 | <code>xxreg1</code>) to the IP address owned by the selected router |
06a26dd2 MS |
1587 | port (ingress table <code>ARP Request</code> will generate an ARP |
1588 | request, if needed, with <code>reg0</code> as the target protocol | |
1589 | address and <code>reg1</code> as the source protocol address). | |
9975d7be BP |
1590 | </p> |
1591 | ||
1592 | <p> | |
1593 | This table contains the following logical flows: | |
1594 | </p> | |
1595 | ||
1596 | <ul> | |
06a26dd2 MS |
1597 | <li> |
1598 | <p> | |
1599 | For distributed logical routers where one of the logical router | |
1600 | ports specifies a <code>redirect-chassis</code>, a priority-300 | |
1601 | logical flow with match <code>REGBIT_NAT_REDIRECT == 1</code> has | |
1602 | actions <code>ip.ttl--; next;</code>. The <code>outport</code> | |
1603 | will be set later in the Gateway Redirect table. | |
1604 | </p> | |
1605 | </li> | |
1606 | ||
9975d7be BP |
1607 | <li> |
1608 | <p> | |
6fdb7cd6 | 1609 | IPv4 routing table. For each route to IPv4 network <var>N</var> with |
0bac7164 BP |
1610 | netmask <var>M</var>, on router port <var>P</var> with IP address |
1611 | <var>A</var> and Ethernet | |
1612 | address <var>E</var>, a logical flow with match <code>ip4.dst == | |
9975d7be BP |
1613 | <var>N</var>/<var>M</var></code>, whose priority is the number of |
1614 | 1-bits in <var>M</var>, has the following actions: | |
1615 | </p> | |
1616 | ||
1617 | <pre> | |
47f3b59b | 1618 | ip.ttl--; |
9975d7be | 1619 | reg0 = <var>G</var>; |
0bac7164 BP |
1620 | reg1 = <var>A</var>; |
1621 | eth.src = <var>E</var>; | |
1622 | outport = <var>P</var>; | |
bf143492 | 1623 | flags.loopback = 1; |
9975d7be BP |
1624 | next; |
1625 | </pre> | |
1626 | ||
1627 | <p> | |
47f3b59b | 1628 | (Ingress table 1 already verified that <code>ip.ttl--;</code> will |
9975d7be BP |
1629 | not yield a TTL exceeded error.) |
1630 | </p> | |
1631 | ||
1632 | <p> | |
28dc3fe9 SR |
1633 | If the route has a gateway, <var>G</var> is the gateway IP address. |
1634 | Instead, if the route is from a configured static route, <var>G</var> | |
1635 | is the next hop IP address. Else it is <code>ip4.dst</code>. | |
9975d7be BP |
1636 | </p> |
1637 | </li> | |
6fdb7cd6 JP |
1638 | |
1639 | <li> | |
1640 | <p> | |
1641 | IPv6 routing table. For each route to IPv6 network | |
1642 | <var>N</var> with netmask <var>M</var>, on router port | |
1643 | <var>P</var> with IP address <var>A</var> and Ethernet address | |
1644 | <var>E</var>, a logical flow with match in CIDR notation | |
1645 | <code>ip6.dst == <var>N</var>/<var>M</var></code>, | |
1646 | whose priority is the integer value of <var>M</var>, has the | |
1647 | following actions: | |
1648 | </p> | |
1649 | ||
1650 | <pre> | |
1651 | ip.ttl--; | |
1652 | xxreg0 = <var>G</var>; | |
1653 | xxreg1 = <var>A</var>; | |
1654 | eth.src = <var>E</var>; | |
1655 | outport = <var>P</var>; | |
bf143492 | 1656 | flags.loopback = 1; |
6fdb7cd6 JP |
1657 | next; |
1658 | </pre> | |
1659 | ||
1660 | <p> | |
1661 | (Ingress table 1 already verified that <code>ip.ttl--;</code> will | |
1662 | not yield a TTL exceeded error.) | |
1663 | </p> | |
1664 | ||
1665 | <p> | |
1666 | If the route has a gateway, <var>G</var> is the gateway IP address. | |
1667 | Instead, if the route is from a configured static route, <var>G</var> | |
1668 | is the next hop IP address. Else it is <code>ip6.dst</code>. | |
1669 | </p> | |
a63f7235 JP |
1670 | |
1671 | <p> | |
1672 | If the address <var>A</var> is in the link-local scope, the | |
1673 | route will be limited to sending on the ingress port. | |
1674 | </p> | |
6fdb7cd6 | 1675 | </li> |
9975d7be BP |
1676 | </ul> |
1677 | ||
cc4583aa | 1678 | <h3>Ingress Table 6: ARP/ND Resolution</h3> |
9975d7be BP |
1679 | |
1680 | <p> | |
6fdb7cd6 JP |
1681 | Any packet that reaches this table is an IP packet whose next-hop |
1682 | IPv4 address is in <code>reg0</code> or IPv6 address is in | |
1683 | <code>xxreg0</code>. (<code>ip4.dst</code> or | |
1684 | <code>ip6.dst</code> contains the final destination.) This table | |
1685 | resolves the IP address in <code>reg0</code> (or | |
1686 | <code>xxreg0</code>) into an output port in <code>outport</code> | |
1687 | and an Ethernet address in <code>eth.dst</code>, using the | |
1688 | following flows: | |
9975d7be BP |
1689 | </p> |
1690 | ||
1691 | <ul> | |
06a26dd2 MS |
1692 | <li> |
1693 | <p> | |
1694 | For distributed logical routers where one of the logical router | |
1695 | ports specifies a <code>redirect-chassis</code>, a priority-200 | |
1696 | logical flow with match <code>REGBIT_NAT_REDIRECT == 1</code> has | |
1697 | actions <code>eth.dst = <var>E</var>; next;</code>, where | |
1698 | <var>E</var> is the ethernet address of the router's distributed | |
1699 | gateway port. | |
1700 | </p> | |
1701 | </li> | |
1702 | ||
9975d7be BP |
1703 | <li> |
1704 | <p> | |
0bac7164 BP |
1705 | Static MAC bindings. MAC bindings can be known statically based on |
1706 | data in the <code>OVN_Northbound</code> database. For router ports | |
1707 | connected to logical switches, MAC bindings can be known statically | |
1708 | from the <code>addresses</code> column in the | |
80f408f4 JP |
1709 | <code>Logical_Switch_Port</code> table. For router ports |
1710 | connected to other logical routers, MAC bindings can be known | |
4685e523 | 1711 | statically from the <code>mac</code> and <code>networks</code> |
80f408f4 | 1712 | column in the <code>Logical_Router_Port</code> table. |
9975d7be BP |
1713 | </p> |
1714 | ||
0bac7164 | 1715 | <p> |
6fdb7cd6 JP |
1716 | For each IPv4 address <var>A</var> whose host is known to have |
1717 | Ethernet address <var>E</var> on router port <var>P</var>, a | |
1718 | priority-100 flow with match <code>outport === <var>P</var> | |
1719 | && reg0 == <var>A</var></code> has actions | |
1720 | <code>eth.dst = <var>E</var>; next;</code>. | |
1721 | </p> | |
1722 | ||
1723 | <p> | |
1724 | For each IPv6 address <var>A</var> whose host is known to have | |
1725 | Ethernet address <var>E</var> on router port <var>P</var>, a | |
1726 | priority-100 flow with match <code>outport === <var>P</var> | |
1727 | && xxreg0 == <var>A</var></code> has actions | |
1728 | <code>eth.dst = <var>E</var>; next;</code>. | |
1729 | </p> | |
1730 | ||
1731 | <p> | |
1732 | For each logical router port with an IPv4 address <var>A</var> and | |
1733 | a mac address of <var>E</var> that is reachable via a different | |
1734 | logical router port <var>P</var>, a priority-100 flow with | |
1735 | match <code>outport === <var>P</var> && reg0 == | |
0bac7164 BP |
1736 | <var>A</var></code> has actions <code>eth.dst = <var>E</var>; |
1737 | next;</code>. | |
1738 | </p> | |
509afdc3 GS |
1739 | |
1740 | <p> | |
6fdb7cd6 | 1741 | For each logical router port with an IPv6 address <var>A</var> and |
509afdc3 GS |
1742 | a mac address of <var>E</var> that is reachable via a different |
1743 | logical router port <var>P</var>, a priority-100 flow with | |
6fdb7cd6 | 1744 | match <code>outport === <var>P</var> && xxreg0 == |
509afdc3 GS |
1745 | <var>A</var></code> has actions <code>eth.dst = <var>E</var>; |
1746 | next;</code>. | |
1747 | </p> | |
0bac7164 BP |
1748 | </li> |
1749 | ||
1750 | <li> | |
1751 | <p> | |
c34a87b6 JP |
1752 | Dynamic MAC bindings. These flows resolve MAC-to-IP bindings |
1753 | that have become known dynamically through ARP or neighbor | |
06a26dd2 MS |
1754 | discovery. (The ingress table <code>ARP Request</code> will |
1755 | issue an ARP or neighbor solicitation request for cases where | |
1756 | the binding is not yet known.) | |
0bac7164 | 1757 | </p> |
9975d7be BP |
1758 | |
1759 | <p> | |
c34a87b6 | 1760 | A priority-0 logical flow with match <code>ip4</code> has actions |
0bac7164 | 1761 | <code>get_arp(outport, reg0); next;</code>. |
9975d7be | 1762 | </p> |
c34a87b6 JP |
1763 | |
1764 | <p> | |
1765 | A priority-0 logical flow with match <code>ip6</code> has actions | |
1766 | <code>get_nd(outport, xxreg0); next;</code>. | |
1767 | </p> | |
9975d7be | 1768 | </li> |
0bac7164 BP |
1769 | </ul> |
1770 | ||
41a15b71 MS |
1771 | <h3>Ingress Table 7: Gateway Redirect</h3> |
1772 | ||
1773 | <p> | |
1774 | For distributed logical routers where one of the logical router | |
1775 | ports specifies a <code>redirect-chassis</code>, this table redirects | |
1776 | certain packets to the distributed gateway port instance on the | |
1777 | <code>redirect-chassis</code>. This table has the following flows: | |
1778 | </p> | |
1779 | ||
1780 | <ul> | |
06a26dd2 MS |
1781 | <li> |
1782 | A priority-200 logical flow with match | |
1783 | <code>REGBIT_NAT_REDIRECT == 1</code> has actions | |
1784 | <code>outport = <var>CR</var>; next;</code>, where <var>CR</var> | |
1785 | is the <code>chassisredirect</code> port representing the instance | |
1786 | of the logical router distributed gateway port on the | |
1787 | <code>redirect-chassis</code>. | |
1788 | </li> | |
1789 | ||
41a15b71 MS |
1790 | <li> |
1791 | A priority-150 logical flow with match | |
1792 | <code>outport == <var>GW</var> && | |
1793 | eth.dst == 00:00:00:00:00:00</code> has actions | |
1794 | <code>outport = <var>CR</var>; next;</code>, where | |
1795 | <var>GW</var> is the logical router distributed gateway | |
1796 | port and <var>CR</var> is the <code>chassisredirect</code> | |
1797 | port representing the instance of the logical router | |
1798 | distributed gateway port on the | |
1799 | <code>redirect-chassis</code>. | |
1800 | </li> | |
1801 | ||
06a26dd2 MS |
1802 | <li> |
1803 | For each NAT rule in the OVN Northbound database that can | |
1804 | be handled in a distributed manner, a priority-100 logical | |
1805 | flow with match <code>ip4.src == <var>B</var> && | |
1806 | outport == <var>GW</var></code>, where <var>GW</var> is | |
1807 | the logical router distributed gateway port, with actions | |
1808 | <code>next;</code>. | |
1809 | </li> | |
1810 | ||
41a15b71 MS |
1811 | <li> |
1812 | A priority-50 logical flow with match | |
1813 | <code>outport == <var>GW</var></code> has actions | |
1814 | <code>outport = <var>CR</var>; next;</code>, where | |
1815 | <var>GW</var> is the logical router distributed gateway | |
1816 | port and <var>CR</var> is the <code>chassisredirect</code> | |
1817 | port representing the instance of the logical router | |
1818 | distributed gateway port on the | |
1819 | <code>redirect-chassis</code>. | |
1820 | </li> | |
1821 | ||
1822 | <li> | |
1823 | A priority-0 logical flow with match <code>1</code> has actions | |
1824 | <code>next;</code>. | |
1825 | </li> | |
1826 | </ul> | |
1827 | ||
1828 | <h3>Ingress Table 8: ARP Request</h3> | |
0bac7164 BP |
1829 | |
1830 | <p> | |
1831 | In the common case where the Ethernet destination has been resolved, this | |
1832 | table outputs the packet. Otherwise, it composes and sends an ARP | |
1833 | request. It holds the following flows: | |
1834 | </p> | |
9975d7be | 1835 | |
0bac7164 | 1836 | <ul> |
9975d7be BP |
1837 | <li> |
1838 | <p> | |
0bac7164 BP |
1839 | Unknown MAC address. A priority-100 flow with match <code>eth.dst == |
1840 | 00:00:00:00:00:00</code> has the following actions: | |
9975d7be BP |
1841 | </p> |
1842 | ||
1843 | <pre> | |
1844 | arp { | |
1845 | eth.dst = ff:ff:ff:ff:ff:ff; | |
0bac7164 | 1846 | arp.spa = reg1; |
47021598 | 1847 | arp.tpa = reg0; |
9975d7be | 1848 | arp.op = 1; /* ARP request. */ |
9975d7be BP |
1849 | output; |
1850 | }; | |
1851 | </pre> | |
1852 | ||
1853 | <p> | |
06a26dd2 MS |
1854 | (Ingress table <code>IP Routing</code> initialized <code>reg1</code> |
1855 | with the IP address owned by <code>outport</code> and | |
1856 | <code>reg0</code> with the next-hop IP address) | |
9975d7be BP |
1857 | </p> |
1858 | ||
1859 | <p> | |
0bac7164 | 1860 | The IP packet that triggers the ARP request is dropped. |
9975d7be BP |
1861 | </p> |
1862 | </li> | |
0bac7164 BP |
1863 | |
1864 | <li> | |
1865 | Known MAC address. A priority-0 flow with match <code>1</code> has | |
1866 | actions <code>output;</code>. | |
1867 | </li> | |
9975d7be BP |
1868 | </ul> |
1869 | ||
06a26dd2 MS |
1870 | <h3>Egress Table 0: UNDNAT</h3> |
1871 | ||
1872 | <p> | |
1873 | This is for already established connections' reverse traffic. | |
1874 | i.e., DNAT has already been done in ingress pipeline and now the | |
1875 | packet has entered the egress pipeline as part of a reply. For | |
1876 | NAT on a distributed router, it is unDNATted here. For Gateway | |
1877 | routers, the unDNAT processing is carried out in the ingress DNAT | |
1878 | table. | |
1879 | </p> | |
1880 | ||
1881 | <ul> | |
1882 | <li> | |
1883 | <p> | |
1884 | For each configuration in the OVN Northbound database that asks | |
1885 | to change the destination IP address of a packet from an IP | |
1886 | address of <var>A</var> to <var>B</var>, a priority-100 flow | |
1887 | matches <code>ip && ip4.src == <var>B</var> | |
1888 | && outport == <var>GW</var></code>, where <var>GW</var> | |
1889 | is the logical router gateway port, with an action | |
1890 | <code>ct_dnat;</code>. | |
1891 | </p> | |
1892 | ||
1893 | <p> | |
1894 | If the NAT rule cannot be handled in a distributed manner, then | |
1895 | the priority-100 flow above is only programmed on the | |
1896 | <code>redirect-chassis</code>. | |
1897 | </p> | |
1898 | ||
1899 | <p> | |
1900 | If the NAT rule can be handled in a distributed manner, then | |
1901 | there is an additional action | |
1902 | <code>eth.src = <var>EA</var>;</code>, where <var>EA</var> | |
1903 | is the ethernet address associated with the IP address | |
1904 | <var>A</var> in the NAT rule. This allows upstream MAC | |
1905 | learning to point to the correct chassis. | |
1906 | </p> | |
1907 | </li> | |
1908 | ||
1909 | <li> | |
1910 | A priority-0 logical flow with match <code>1</code> has actions | |
1911 | <code>next;</code>. | |
1912 | </li> | |
1913 | </ul> | |
1914 | ||
1915 | <h3>Egress Table 1: SNAT</h3> | |
de297547 GS |
1916 | |
1917 | <p> | |
1918 | Packets that are configured to be SNATed get their source IP address | |
1919 | changed based on the configuration in the OVN Northbound database. | |
1920 | </p> | |
06a26dd2 MS |
1921 | |
1922 | <p>Egress Table 1: SNAT on Gateway Routers</p> | |
1923 | ||
de297547 GS |
1924 | <ul> |
1925 | <li> | |
1926 | <p> | |
65d8810c GS |
1927 | If the Gateway router in the OVN Northbound database has been |
1928 | configured to force SNAT a packet (that has been previously DNATted) | |
1929 | to <var>B</var>, a priority-100 flow matches | |
1930 | <code>flags.force_snat_for_dnat == 1 && ip</code> with an | |
1931 | action <code>ct_snat(<var>B</var>);</code>. | |
1932 | </p> | |
1933 | <p> | |
1934 | If the Gateway router in the OVN Northbound database has been | |
1935 | configured to force SNAT a packet (that has been previously | |
1936 | load-balanced) to <var>B</var>, a priority-100 flow matches | |
1937 | <code>flags.force_snat_for_lb == 1 && ip</code> with an | |
1938 | action <code>ct_snat(<var>B</var>);</code>. | |
1939 | </p> | |
1940 | <p> | |
de297547 GS |
1941 | For each configuration in the OVN Northbound database, that asks |
1942 | to change the source IP address of a packet from an IP address of | |
1943 | <var>A</var> or to change the source IP address of a packet that | |
1944 | belongs to network <var>A</var> to <var>B</var>, a flow matches | |
1945 | <code>ip && ip4.src == <var>A</var></code> with an action | |
1946 | <code>ct_snat(<var>B</var>);</code>. The priority of the flow | |
1947 | is calculated based on the mask of <var>A</var>, with matches | |
1948 | having larger masks getting higher priorities. | |
1949 | </p> | |
1950 | <p> | |
1951 | A priority-0 logical flow with match <code>1</code> has actions | |
1952 | <code>next;</code>. | |
1953 | </p> | |
1954 | </li> | |
1955 | </ul> | |
1956 | ||
06a26dd2 MS |
1957 | <p>Egress Table 1: SNAT on Distributed Routers</p> |
1958 | ||
1959 | <ul> | |
1960 | <li> | |
1961 | <p> | |
1962 | For each configuration in the OVN Northbound database, that asks | |
1963 | to change the source IP address of a packet from an IP address of | |
1964 | <var>A</var> or to change the source IP address of a packet that | |
1965 | belongs to network <var>A</var> to <var>B</var>, a flow matches | |
1966 | <code>ip && ip4.src == <var>A</var> && | |
1967 | outport == <var>GW</var></code>, where <var>GW</var> is the | |
1968 | logical router gateway port, with an action | |
1969 | <code>ct_snat(<var>B</var>);</code>. The priority of the flow | |
1970 | is calculated based on the mask of <var>A</var>, with matches | |
1971 | having larger masks getting higher priorities. | |
1972 | </p> | |
1973 | ||
1974 | <p> | |
1975 | If the NAT rule cannot be handled in a distributed manner, then | |
1976 | the flow above is only programmed on the | |
1977 | <code>redirect-chassis</code>. | |
1978 | </p> | |
1979 | ||
1980 | <p> | |
1981 | If the NAT rule can be handled in a distributed manner, then | |
1982 | there is an additional action | |
1983 | <code>eth.src = <var>EA</var>;</code>, where <var>EA</var> | |
1984 | is the ethernet address associated with the IP address | |
1985 | <var>A</var> in the NAT rule. This allows upstream MAC | |
1986 | learning to point to the correct chassis. | |
1987 | </p> | |
1988 | </li> | |
1989 | ||
1990 | <li> | |
1991 | A priority-0 logical flow with match <code>1</code> has actions | |
1992 | <code>next;</code>. | |
1993 | </li> | |
1994 | </ul> | |
1995 | ||
1996 | <h3>Egress Table 2: Egress Loopback</h3> | |
1997 | ||
1998 | <p> | |
1999 | For distributed logical routers where one of the logical router | |
2000 | ports specifies a <code>redirect-chassis</code>. | |
2001 | </p> | |
2002 | ||
2003 | <p> | |
2004 | Earlier in the ingress pipeline, some east-west traffic was | |
2005 | redirected to the <code>chassisredirect</code> port, based on | |
2006 | flows in the <code>UNSNAT</code> and <code>DNAT</code> ingress | |
2007 | tables setting the <code>REGBIT_NAT_REDIRECT</code> flag, which | |
2008 | then triggered a match to a flow in the | |
2009 | <code>Gateway Redirect</code> ingress table. The intention was | |
2010 | not to actually send traffic out the distributed gateway port | |
2011 | instance on the <code>redirect-chassis</code>. This traffic was | |
2012 | sent to the distributed gateway port instance in order for DNAT | |
2013 | and/or SNAT processing to be applied. | |
2014 | </p> | |
2015 | ||
2016 | <p> | |
2017 | While UNDNAT and SNAT processing have already occurred by this | |
2018 | point, this traffic needs to be forced through egress loopback on | |
2019 | this distributed gateway port instance, in order for UNSNAT and | |
2020 | DNAT processing to be applied, and also for IP routing and ARP | |
2021 | resolution after all of the NAT processing, so that the packet can | |
2022 | be forwarded to the destination. | |
2023 | </p> | |
2024 | ||
2025 | <p> | |
2026 | This table has the following flows: | |
2027 | </p> | |
2028 | ||
2029 | <ul> | |
2030 | <li> | |
2031 | <p> | |
2032 | For each NAT rule in the OVN Northbound database on a | |
2033 | distributed router, a priority-100 logical flow with match | |
2034 | <code>ip4.dst == <var>E</var> && | |
2035 | outport == <var>GW</var></code>, where <var>E</var> is the | |
2036 | external IP address specified in the NAT rule, and <var>GW</var> | |
2037 | is the logical router distributed gateway port, with the | |
2038 | following actions: | |
2039 | </p> | |
2040 | ||
2041 | <pre> | |
2042 | clone { | |
2043 | ct_clear; | |
2044 | inport = outport; | |
2045 | outport = ""; | |
2046 | flags = 0; | |
2047 | flags.loopback = 1; | |
2048 | reg0 = 0; | |
2049 | reg1 = 0; | |
2050 | ... | |
2051 | reg9 = 0; | |
2052 | REGBIT_EGRESS_LOOPBACK = 1; | |
2053 | next(pipeline=ingress, table=0); | |
2054 | }; | |
2055 | </pre> | |
2056 | ||
2057 | <p> | |
2058 | <code>flags.loopback</code> is set since in_port is unchanged | |
2059 | and the packet may return back to that port after NAT processing. | |
2060 | <code>REGBIT_EGRESS_LOOPBACK</code> is set to indicate that | |
2061 | egress loopback has occurred, in order to skip the source IP | |
2062 | address check against the router address. | |
2063 | </p> | |
2064 | </li> | |
2065 | ||
2066 | <li> | |
2067 | A priority-0 logical flow with match <code>1</code> has actions | |
2068 | <code>next;</code>. | |
2069 | </li> | |
2070 | </ul> | |
2071 | ||
2072 | <h3>Egress Table 3: Delivery</h3> | |
9975d7be BP |
2073 | |
2074 | <p> | |
2075 | Packets that reach this table are ready for delivery. It contains | |
2076 | priority-100 logical flows that match packets on each enabled logical | |
2077 | router port, with action <code>output;</code>. | |
2078 | </p> | |
2079 | ||
1af530bc | 2080 | </manpage> |