]>
Commit | Line | Data |
---|---|---|
7f907848 SF |
1 | .. |
2 | Licensed under the Apache License, Version 2.0 (the "License"); you may | |
3 | not use this file except in compliance with the License. You may obtain | |
4 | a copy of the License at | |
5 | ||
6 | http://www.apache.org/licenses/LICENSE-2.0 | |
7 | ||
8 | Unless required by applicable law or agreed to in writing, software | |
9 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
10 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
11 | License for the specific language governing permissions and limitations | |
12 | under the License. | |
13 | ||
14 | Convention for heading levels in Open vSwitch documentation: | |
15 | ||
16 | ======= Heading 0 (reserved for the title in a document) | |
17 | ------- Heading 1 | |
18 | ~~~~~~~ Heading 2 | |
19 | +++++++ Heading 3 | |
20 | ''''''' Heading 4 | |
21 | ||
22 | Avoid deeper levels because they do not render well. | |
23 | ||
24 | ================================================ | |
25 | Porting Open vSwitch to New Software or Hardware | |
26 | ================================================ | |
27 | ||
28 | Open vSwitch (OVS) is intended to be easily ported to new software and hardware | |
29 | platforms. This document describes the types of changes that are most likely | |
30 | to be necessary in porting OVS to Unix-like platforms. (Porting OVS to other | |
31 | kinds of platforms is likely to be more difficult.) | |
32 | ||
33 | Vocabulary | |
34 | ---------- | |
35 | ||
36 | For historical reasons, different words are used for essentially the same | |
37 | concept in different areas of the Open vSwitch source tree. Here is a | |
38 | concordance, indexed by the area of the source tree: | |
39 | ||
40 | :: | |
41 | ||
42 | datapath/ vport --- | |
43 | vswitchd/ iface port | |
44 | ofproto/ port bundle | |
45 | ofproto/bond.c slave bond | |
46 | lib/lacp.c slave lacp | |
47 | lib/netdev.c netdev --- | |
48 | database Interface Port | |
49 | ||
50 | Open vSwitch Architectural Overview | |
51 | ----------------------------------- | |
52 | ||
53 | The following diagram shows the very high-level architecture of Open vSwitch | |
54 | from a porter's perspective. | |
55 | ||
56 | :: | |
57 | ||
58 | +-------------------+ | |
59 | | ovs-vswitchd |<-->ovsdb-server | |
60 | +-------------------+ | |
61 | | ofproto |<-->OpenFlow controllers | |
62 | +--------+-+--------+ | |
63 | | netdev | | ofproto| | |
64 | +--------+ |provider| | |
65 | | netdev | +--------+ | |
66 | |provider| | |
67 | +--------+ | |
68 | ||
69 | Some of the components are generic. Modulo bugs or inadequacies, these | |
70 | components should not need to be modified as part of a port: | |
71 | ||
72 | ovs-vswitchd | |
73 | The main Open vSwitch userspace program, in vswitchd/. It reads the desired | |
74 | Open vSwitch configuration from the ovsdb-server program over an IPC channel | |
75 | and passes this configuration down to the "ofproto" library. It also passes | |
76 | certain status and statistical information from ofproto back into the | |
77 | database. | |
78 | ||
79 | ofproto | |
80 | The Open vSwitch library, in ofproto/, that implements an OpenFlow switch. | |
81 | It talks to OpenFlow controllers over the network and to switch hardware or | |
82 | software through an "ofproto provider", explained further below. | |
83 | ||
84 | netdev | |
85 | The Open vSwitch library, in lib/netdev.c, that abstracts interacting with | |
86 | network devices, that is, Ethernet interfaces. The netdev library is a thin | |
87 | layer over "netdev provider" code, explained further below. | |
88 | ||
89 | The other components may need attention during a port. You will almost | |
90 | certainly have to implement a "netdev provider". Depending on the type of port | |
91 | you are doing and the desired performance, you may also have to implement an | |
92 | "ofproto provider" or a lower-level component called a "dpif" provider. | |
93 | ||
94 | The following sections talk about these components in more detail. | |
95 | ||
96 | Writing a netdev Provider | |
97 | ------------------------- | |
98 | ||
99 | A "netdev provider" implements an operating system and hardware specific | |
100 | interface to "network devices", e.g. eth0 on Linux. Open vSwitch must be able | |
101 | to open each port on a switch as a netdev, so you will need to implement a | |
102 | "netdev provider" that works with your switch hardware and software. | |
103 | ||
104 | ``struct netdev_class``, in ``lib/netdev-provider.h``, defines the interfaces | |
105 | required to implement a netdev. That structure contains many function | |
106 | pointers, each of which has a comment that is meant to describe its behavior in | |
107 | detail. If the requirements are unclear, report this as a bug. | |
108 | ||
109 | The netdev interface can be divided into a few rough categories: | |
110 | ||
111 | - Functions required to properly implement OpenFlow features. For example, | |
112 | OpenFlow requires the ability to report the Ethernet hardware address of a | |
113 | port. These functions must be implemented for minimally correct operation. | |
114 | ||
115 | - Functions required to implement optional Open vSwitch features. For example, | |
116 | the Open vSwitch support for in-band control requires netdev support for | |
117 | inspecting the TCP/IP stack's ARP table. These functions must be implemented | |
118 | if the corresponding OVS features are to work, but may be omitted initially. | |
119 | ||
120 | - Functions needed in some implementations but not in others. For example, | |
121 | most kinds of ports (see below) do not need functionality to receive packets | |
122 | from a network device. | |
123 | ||
124 | The existing netdev implementations may serve as useful examples during a port: | |
125 | ||
126 | - lib/netdev-linux.c implements netdev functionality for Linux network devices, | |
127 | using Linux kernel calls. It may be a good place to start for full-featured | |
128 | netdev implementations. | |
129 | ||
130 | - lib/netdev-vport.c provides support for "virtual ports" implemented by the | |
131 | Open vSwitch datapath module for the Linux kernel. This may serve as a model | |
132 | for minimal netdev implementations. | |
133 | ||
134 | - lib/netdev-dummy.c is a fake netdev implementation useful only for testing. | |
135 | ||
136 | .. _porting strategies: | |
137 | ||
138 | Porting Strategies | |
139 | ------------------ | |
140 | ||
141 | After a netdev provider has been implemented for a system's network devices, | |
142 | you may choose among three basic porting strategies. | |
143 | ||
144 | The lowest-effort strategy is to use the "userspace switch" implementation | |
145 | built into Open vSwitch. This ought to work, without writing any more code, as | |
146 | long as the netdev provider that you implemented supports receiving packets. | |
147 | It yields poor performance, however, because every packet passes through the | |
7c9afefd | 148 | ovs-vswitchd process. Refer to :doc:`/intro/install/userspace` for instructions |
795752a3 | 149 | on how to configure a userspace switch. |
7f907848 SF |
150 | |
151 | If the userspace switch is not the right choice for your port, then you will | |
152 | have to write more code. You may implement either an "ofproto provider" or a | |
153 | "dpif provider". Which you should choose depends on a few different factors: | |
154 | ||
155 | * Only an ofproto provider can take full advantage of hardware with built-in | |
156 | support for wildcards (e.g. an ACL table or a TCAM). | |
157 | ||
158 | * A dpif provider can take advantage of the Open vSwitch built-in | |
159 | implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and other features. | |
160 | An ofproto provider has to provide its own implementations, if the hardware | |
161 | can support them at all. | |
162 | ||
163 | * A dpif provider is usually easier to implement, but most appropriate for | |
164 | software switching. It "explodes" wildcard rules into exact-match entries | |
165 | (with an optional wildcard mask). This allows fast hash lookups in software, | |
166 | but makes inefficient use of TCAMs in hardware that support wildcarding. | |
167 | ||
168 | The following sections describe how to implement each kind of port. | |
169 | ||
170 | ofproto Providers | |
171 | ----------------- | |
172 | ||
173 | An "ofproto provider" is what ofproto uses to directly monitor and control an | |
174 | OpenFlow-capable switch. ``struct ofproto_class``, in | |
175 | ``ofproto/ofproto-provider.h``, defines the interfaces to implement an ofproto | |
176 | provider for new hardware or software. That structure contains many function | |
177 | pointers, each of which has a comment that is meant to describe its behavior in | |
178 | detail. If the requirements are unclear, report this as a bug. | |
179 | ||
180 | The ofproto provider interface is preliminary. Let us know if it seems | |
181 | unsuitable for your purpose. We will try to improve it. | |
182 | ||
183 | Writing a dpif Provider | |
184 | ----------------------- | |
185 | ||
186 | Open vSwitch has a built-in ofproto provider named "ofproto-dpif", which is | |
187 | built on top of a library for manipulating datapaths, called "dpif". A | |
188 | "datapath" is a simple flow table, one that is only required to support | |
189 | exact-match flows, that is, flows without wildcards. When a packet arrives on | |
190 | a network device, the datapath looks for it in this table. If there is a | |
191 | match, then it performs the associated actions. If there is no match, the | |
192 | datapath passes the packet up to ofproto-dpif, which maintains the full | |
193 | OpenFlow flow table. If the packet matches in this flow table, then | |
194 | ofproto-dpif executes its actions and inserts a new entry into the dpif flow | |
195 | table. (Otherwise, ofproto-dpif passes the packet up to ofproto to send the | |
196 | packet to the OpenFlow controller, if one is configured.) | |
197 | ||
198 | When calculating the dpif flow, ofproto-dpif generates an exact-match flow that | |
199 | describes the missed packet. It makes an effort to figure out what fields can | |
200 | be wildcarded based on the switch's configuration and OpenFlow flow table. The | |
201 | dpif is free to ignore the suggested wildcards and only support the exact-match | |
202 | entry. However, if the dpif supports wildcarding, then it can use the masks to | |
203 | match multiple flows with fewer entries and potentially significantly reduce | |
204 | the number of flow misses handled by ofproto-dpif. | |
205 | ||
206 | The "dpif" library in turn delegates much of its functionality to a "dpif | |
207 | provider". The following diagram shows how dpif providers fit into the Open | |
208 | vSwitch architecture: | |
209 | ||
210 | :: | |
211 | ||
212 | ||
213 | Architecure | |
214 | ||
215 | _ | |
216 | | +-------------------+ | |
217 | | | ovs-vswitchd |<-->ovsdb-server | |
218 | | +-------------------+ | |
219 | | | ofproto |<-->OpenFlow controllers | |
220 | | +--------+-+--------+ _ | |
221 | | | netdev | |ofproto-| | | |
222 | userspace | +--------+ | dpif | | | |
223 | | | netdev | +--------+ | | |
224 | | |provider| | dpif | | | |
225 | | +---||---+ +--------+ | | |
226 | | || | dpif | | implementation of | |
227 | | || |provider| | ofproto provider | |
228 | |_ || +---||---+ | | |
229 | || || | | |
230 | _ +---||-----+---||---+ | | |
231 | | | |datapath| | | |
232 | kernel | | +--------+ _| | |
233 | | | | | |
234 | |_ +--------||---------+ | |
235 | || | |
236 | physical | |
237 | NIC | |
238 | ||
239 | struct ``dpif_class``, in ``lib/dpif-provider.h``, defines the interfaces | |
240 | required to implement a dpif provider for new hardware or software. That | |
241 | structure contains many function pointers, each of which has a comment that is | |
242 | meant to describe its behavior in detail. If the requirements are unclear, | |
243 | report this as a bug. | |
244 | ||
245 | There are two existing dpif implementations that may serve as useful examples | |
246 | during a port: | |
247 | ||
248 | * lib/dpif-netlink.c is a Linux-specific dpif implementation that talks to an | |
249 | Open vSwitch-specific kernel module (whose sources are in the "datapath" | |
250 | directory). The kernel module performs all of the switching work, passing | |
251 | packets that do not match any flow table entry up to userspace. This dpif | |
252 | implementation is essentially a wrapper around calls into the kernel module. | |
253 | ||
254 | * lib/dpif-netdev.c is a generic dpif implementation that performs all | |
255 | switching internally. This is how the Open vSwitch userspace switch is | |
256 | implemented. | |
257 | ||
258 | Miscellaneous Notes | |
259 | ------------------- | |
260 | ||
261 | Open vSwitch source code uses ``uint16_t``, ``uint32_t``, and ``uint64_t`` as | |
262 | fixed-width types in host byte order, and ``ovs_be16``, ``ovs_be32``, and | |
263 | ``ovs_be64`` as fixed-width types in network byte order. Each of the latter is | |
264 | equivalent to the one of the former, but the difference in name makes the | |
265 | intended use obvious. | |
266 | ||
267 | The default "fail-mode" for Open vSwitch bridges is "standalone", meaning that, | |
268 | when the OpenFlow controllers cannot be contacted, Open vSwitch acts as a | |
269 | regular MAC-learning switch. This works well in virtualization environments | |
270 | where there is normally just one uplink (either a single physical interface or | |
271 | a bond). In a more general environment, it can create loops. So, if you are | |
272 | porting to a general-purpose switch platform, you should consider changing the | |
273 | default "fail-mode" to "secure", which does not behave this way. See | |
274 | documentation for the "fail-mode" column in the Bridge table in | |
275 | ovs-vswitchd.conf.db(5) for more information. | |
276 | ||
277 | ``lib/entropy.c`` assumes that it can obtain high-quality random number seeds | |
278 | at startup by reading from /dev/urandom. You will need to modify it if this is | |
279 | not true on your platform. | |
280 | ||
281 | ``vswitchd/system-stats.c`` only knows how to obtain some statistics on Linux. | |
282 | Optionally you may implement them for your platform as well. | |
283 | ||
284 | Why OVS Does Not Support Hybrid Providers | |
285 | ----------------------------------------- | |
286 | ||
287 | The `porting strategies`_ section above describes the "ofproto provider" and | |
288 | "dpif provider" porting strategies. Only an ofproto provider can take | |
289 | advantage of hardware TCAM support, and only a dpif provider can take advantage | |
290 | of the OVS built-in implementations of various features. It is therefore | |
291 | tempting to suggest a hybrid approach that shares the advantages of both | |
292 | strategies. | |
293 | ||
294 | However, Open vSwitch does not support a hybrid approach. Doing so may be | |
295 | possible, with a significant amount of extra development work, but it does not | |
296 | yet seem worthwhile, for the reasons explained below. | |
297 | ||
298 | First, user surprise is likely when a switch supports a feature only with a | |
299 | high performance penalty. For example, one user questioned why adding a | |
300 | particular OpenFlow action to a flow caused a 1,058x slowdown on a hardware | |
301 | OpenFlow implementation [1]_. The action required the flow to be implemented in | |
302 | software. | |
303 | ||
304 | Given that implementing a flow in software on the slow management CPU of a | |
305 | hardware switch causes a major slowdown, software-implemented flows would only | |
306 | make sense for very low-volume traffic. But many of the features built into | |
307 | the OVS software switch implementation would need to apply to every flow to be | |
308 | useful. There is no value, for example, in applying bonding or 802.1Q VLAN | |
309 | support only to low-volume traffic. | |
310 | ||
311 | Besides supporting features of OpenFlow actions, a hybrid approach could also | |
312 | support forms of matching not supported by particular switching hardware, by | |
313 | sending all packets that might match a rule to software. But again this can | |
314 | cause an unacceptable slowdown by forcing bulk traffic through software in the | |
315 | hardware switch's slow management CPU. Consider, for example, a hardware | |
316 | switch that can match on the IPv6 Ethernet type but not on fields in IPv6 | |
317 | headers. An OpenFlow table that matched on the IPv6 Ethernet type would | |
318 | perform well, but adding a rule that matched only UDPv6 would force every IPv6 | |
319 | packet to software, slowing down not just UDPv6 but all IPv6 processing. | |
320 | ||
321 | .. [1] Aaron Rosen, "Modify packet fields extremely slow", | |
322 | openflow-discuss mailing list, June 26, 2011, archived at | |
323 | https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html. | |
324 | ||
325 | Questions | |
326 | --------- | |
327 | ||
328 | Direct porting questions to dev@openvswitch.org. We will try to use questions | |
329 | to improve this porting guide. |