]> git.proxmox.com Git - mirror_ovs.git/blob - Documentation/topics/datapath.rst
Eliminate "whitelist" and "blacklist" terms.
[mirror_ovs.git] / Documentation / topics / datapath.rst
1 ..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24 =======================================
25 Open vSwitch Datapath Development Guide
26 =======================================
27
28 The Open vSwitch kernel module allows flexible userspace control over
29 flow-level packet processing on selected network devices. It can be used to
30 implement a plain Ethernet switch, network device bonding, VLAN processing,
31 network access control, flow-based network control, and so on.
32
33 The kernel module implements multiple "datapaths" (analogous to bridges), each
34 of which can have multiple "vports" (analogous to ports within a bridge). Each
35 datapath also has associated with it a "flow table" that userspace populates
36 with "flows" that map from keys based on packet headers and metadata to sets of
37 actions. The most common action forwards the packet to another vport; other
38 actions are also implemented.
39
40 When a packet arrives on a vport, the kernel module processes it by extracting
41 its flow key and looking it up in the flow table. If there is a matching flow,
42 it executes the associated actions. If there is no match, it queues the packet
43 to userspace for processing (as part of its processing, userspace will likely
44 set up a flow to handle further packets of the same type entirely in-kernel).
45
46 Flow Key Compatibility
47 ----------------------
48
49 Network protocols evolve over time. New protocols become important and
50 existing protocols lose their prominence. For the Open vSwitch kernel module
51 to remain relevant, it must be possible for newer versions to parse additional
52 protocols as part of the flow key. It might even be desirable, someday, to
53 drop support for parsing protocols that have become obsolete. Therefore, the
54 Netlink interface to Open vSwitch is designed to allow carefully written
55 userspace applications to work with any version of the flow key, past or
56 future.
57
58 To support this forward and backward compatibility, whenever the kernel module
59 passes a packet to userspace, it also passes along the flow key that it parsed
60 from the packet. Userspace then extracts its own notion of a flow key from the
61 packet and compares it against the kernel-provided version:
62
63 - If userspace's notion of the flow key for the packet matches the kernel's,
64 then nothing special is necessary.
65
66 - If the kernel's flow key includes more fields than the userspace version of
67 the flow key, for example if the kernel decoded IPv6 headers but userspace
68 stopped at the Ethernet type (because it does not understand IPv6), then
69 again nothing special is necessary. Userspace can still set up a flow in the
70 usual way, as long as it uses the kernel-provided flow key to do it.
71
72 - If the userspace flow key includes more fields than the kernel's, for example
73 if userspace decoded an IPv6 header but the kernel stopped at the Ethernet
74 type, then userspace can forward the packet manually, without setting up a
75 flow in the kernel. This case is bad for performance because every packet
76 that the kernel considers part of the flow must go to userspace, but the
77 forwarding behavior is correct. (If userspace can determine that the values
78 of the extra fields would not affect forwarding behavior, then it could set
79 up a flow anyway.)
80
81 How flow keys evolve over time is important to making this work, so
82 the following sections go into detail.
83
84 Flow Key Format
85 ---------------
86
87 A flow key is passed over a Netlink socket as a sequence of Netlink attributes.
88 Some attributes represent packet metadata, defined as any information about a
89 packet that cannot be extracted from the packet itself, e.g. the vport on which
90 the packet was received. Most attributes, however, are extracted from headers
91 within the packet, e.g. source and destination addresses from Ethernet, IP, or
92 TCP headers.
93
94 The ``<linux/openvswitch.h>`` header file defines the exact format of the flow
95 key attributes. For informal explanatory purposes here, we write them as
96 comma-separated strings, with parentheses indicating arguments and nesting.
97 For example, the following could represent a flow key corresponding to a TCP
98 packet that arrived on vport 1::
99
100 in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
101 eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=6, tos=0,
102 frag=no), tcp(src=49163, dst=80)
103
104 Often we ellipsize arguments not important to the discussion, e.g.::
105
106 in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
107
108 Wildcarded Flow Key Format
109 --------------------------
110
111 A wildcarded flow is described with two sequences of Netlink attributes passed
112 over the Netlink socket. A flow key, exactly as described above, and an
113 optional corresponding flow mask.
114
115 A wildcarded flow can represent a group of exact match flows. Each ``1`` bit
116 in the mask specifies an exact match with the corresponding bit in the flow key.
117 A ``0`` bit specifies a don't care bit, which will match either a ``1`` or
118 ``0`` bit of an incoming packet. Using a wildcarded flow can improve the flow
119 set up rate by reducing the number of new flows that need to be processed by
120 the user space program.
121
122 Support for the mask Netlink attribute is optional for both the kernel and user
123 space program. The kernel can ignore the mask attribute, installing an exact
124 match flow, or reduce the number of don't care bits in the kernel to less than
125 what was specified by the user space program. In this case, variations in bits
126 that the kernel does not implement will simply result in additional flow
127 setups. The kernel module will also work with user space programs that neither
128 support nor supply flow mask attributes.
129
130 Since the kernel may ignore or modify wildcard bits, it can be difficult for
131 the userspace program to know exactly what matches are installed. There are two
132 possible approaches: reactively install flows as they miss the kernel flow
133 table (and therefore not attempt to determine wildcard changes at all) or use
134 the kernel's response messages to determine the installed wildcards.
135
136 When interacting with userspace, the kernel should maintain the match portion
137 of the key exactly as originally installed. This will provides a handle to
138 identify the flow for all future operations. However, when reporting the mask
139 of an installed flow, the mask should include any restrictions imposed by the
140 kernel.
141
142 The behavior when using overlapping wildcarded flows is undefined. It is the
143 responsibility of the user space program to ensure that any incoming packet can
144 match at most one flow, wildcarded or not. The current implementation performs
145 best-effort detection of overlapping wildcarded flows and may reject some but
146 not all of them. However, this behavior may change in future versions.
147
148 Unique Flow Identifiers
149 -----------------------
150
151 An alternative to using the original match portion of a key as the handle for
152 flow identification is a unique flow identifier, or "UFID". UFIDs are optional
153 for both the kernel and user space program.
154
155 User space programs that support UFID are expected to provide it during flow
156 setup in addition to the flow, then refer to the flow using the UFID for all
157 future operations. The kernel is not required to index flows by the original
158 flow key if a UFID is specified.
159
160 Basic Rule for Evolving Flow Keys
161 ---------------------------------
162
163 Some care is needed to really maintain forward and backward compatibility for
164 applications that follow the rules listed under "Flow key compatibility" above.
165
166 The basic rule is obvious:
167
168 New network protocol support must only supplement existing flow key
169 attributes. It must not change the meaning of already defined flow key
170 attributes.
171
172 This rule does have less-obvious consequences so it is worth working through a
173 few examples. Suppose, for example, that the kernel module did not already
174 implement VLAN parsing. Instead, it just interpreted the 802.1Q TPID
175 (``0x8100``) as the Ethertype then stopped parsing the packet. The flow key
176 for any packet with an 802.1Q header would look essentially like this, ignoring
177 metadata::
178
179 eth(...), eth_type(0x8100)
180
181 Naively, to add VLAN support, it makes sense to add a new "vlan" flow key
182 attribute to contain the VLAN tag, then continue to decode the encapsulated
183 headers beyond the VLAN tag using the existing field definitions. With this
184 change, a TCP packet in VLAN 10 would have a flow key much like this::
185
186 eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
187
188 But this change would negatively affect a userspace application that has not
189 been updated to understand the new "vlan" flow key attribute. The application
190 could, following the flow compatibility rules above, ignore the "vlan"
191 attribute that it does not understand and therefore assume that the flow
192 contained IP packets. This is a bad assumption (the flow only contains IP
193 packets if one parses and skips over the 802.1Q header) and it could cause the
194 application's behavior to change across kernel versions even though it follows
195 the compatibility rules.
196
197 The solution is to use a set of nested attributes. This is, for example, why
198 802.1Q support uses nested attributes. A TCP packet in VLAN 10 is actually
199 expressed as::
200
201 eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
202 ip(proto=6, ...), tcp(...)))
203
204 Notice how the ``eth_type``, ``ip``, and ``tcp`` flow key attributes are nested
205 inside the ``encap`` attribute. Thus, an application that does not understand
206 the ``vlan`` key will not see either of those attributes and therefore will not
207 misinterpret them. (Also, the outer ``eth_type`` is still ``0x8100``, not
208 changed to ``0x0800``)
209
210 Handling Malformed Packets
211 --------------------------
212
213 Don't drop packets in the kernel for malformed protocol headers, bad checksums,
214 etc. This would prevent userspace from implementing a simple Ethernet switch
215 that forwards every packet.
216
217 Instead, in such a case, include an attribute with "empty" content. It doesn't
218 matter if the empty content could be valid protocol values, as long as those
219 values are rarely seen in practice, because userspace can always forward all
220 packets with those values to userspace and handle them individually.
221
222 For example, consider a packet that contains an IP header that indicates
223 protocol 6 for TCP, but which is truncated just after the IP header, so that
224 the TCP header is missing. The flow key for this packet would include a tcp
225 attribute with all-zero ``src`` and ``dst``, like this::
226
227 eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
228
229 As another example, consider a packet with an Ethernet type of 0x8100,
230 indicating that a VLAN TCI should follow, but which is truncated just after the
231 Ethernet type. The flow key for this packet would include an all-zero-bits
232 vlan and an empty encap attribute, like this::
233
234 eth(...), eth_type(0x8100), vlan(0), encap()
235
236 Unlike a TCP packet with source and destination ports 0, an all-zero-bits VLAN
237 TCI is not that rare, so the CFI bit (aka VLAN_TAG_PRESENT inside the kernel)
238 is ordinarily set in a vlan attribute expressly to allow this situation to be
239 distinguished. Thus, the flow key in this second example unambiguously
240 indicates a missing or malformed VLAN TCI.
241
242 Other Rules
243 -----------
244
245 The other rules for flow keys are much less subtle:
246
247 - Duplicate attributes are not allowed at a given nesting level.
248
249 - Ordering of attributes is not significant.
250
251 - When the kernel sends a given flow key to userspace, it always composes it
252 the same way. This allows userspace to hash and compare entire flow keys
253 that it may not be able to fully interpret.
254
255 Coding Rules
256 ------------
257
258 Implement the headers and codes for compatibility with older kernel in
259 ``linux/compat/`` directory. All public functions should be exported using
260 ``EXPORT_SYMBOL`` macro. Public function replacing the same-named kernel
261 function should be prefixed with ``rpl_``. Otherwise, the function should be
262 prefixed with ``ovs_``. For special case when it is not possible to follow
263 this rule (e.g., the ``pskb_expand_head()`` function), the function name must
264 be added to ``linux/compat/build-aux/export-check-allowlist``, otherwise, the
265 compilation check ``check-export-symbol`` will fail.