]>
Commit | Line | Data |
---|---|---|
bc34d060 BP |
1 | How to Port Open vSwitch to New Software or Hardware |
2 | ==================================================== | |
3 | ||
4 | Open vSwitch (OVS) is intended to be easily ported to new software and | |
5 | hardware platforms. This document describes the types of changes that | |
6 | are most likely to be necessary in porting OVS to Unix-like platforms. | |
7 | (Porting OVS to other kinds of platforms is likely to be more | |
8 | difficult.) | |
9 | ||
abe529af | 10 | |
fa066f01 BP |
11 | Vocabulary |
12 | ---------- | |
13 | ||
14 | For historical reasons, different words are used for essentially the | |
15 | same concept in different areas of the Open vSwitch source tree. Here | |
16 | is a concordance, indexed by the area of the source tree: | |
17 | ||
18 | datapath/ vport --- | |
19 | vswitchd/ iface port | |
20 | ofproto/ port bundle | |
9a54394a | 21 | ofproto/bond.c slave bond |
fa066f01 BP |
22 | lib/lacp.c slave lacp |
23 | lib/netdev.c netdev --- | |
24 | database Interface Port | |
25 | ||
26 | ||
bc34d060 BP |
27 | Open vSwitch Architectural Overview |
28 | ----------------------------------- | |
29 | ||
abe529af | 30 | The following diagram shows the very high-level architecture of Open |
bc34d060 | 31 | vSwitch from a porter's perspective. |
bc34d060 | 32 | |
abe529af BP |
33 | +-------------------+ |
34 | | ovs-vswitchd |<-->ovsdb-server | |
35 | +-------------------+ | |
36 | | ofproto |<-->OpenFlow controllers | |
37 | +--------+-+--------+ | |
38 | | netdev | | ofproto| | |
39 | +--------+ |provider| | |
40 | | netdev | +--------+ | |
41 | |provider| | |
42 | +--------+ | |
43 | ||
44 | Some of the components are generic. Modulo bugs or inadequacies, | |
45 | these components should not need to be modified as part of a port: | |
46 | ||
47 | - "ovs-vswitchd" is the main Open vSwitch userspace program, in | |
48 | vswitchd/. It reads the desired Open vSwitch configuration from | |
49 | the ovsdb-server program over an IPC channel and passes this | |
50 | configuration down to the "ofproto" library. It also passes | |
51 | certain status and statistical information from ofproto back | |
52 | into the database. | |
53 | ||
54 | - "ofproto" is the Open vSwitch library, in ofproto/, that | |
55 | implements an OpenFlow switch. It talks to OpenFlow controllers | |
b69fe6b1 | 56 | over the network and to switch hardware or software through an |
abe529af BP |
57 | "ofproto provider", explained further below. |
58 | ||
59 | - "netdev" is the Open vSwitch library, in lib/netdev.c, that | |
60 | abstracts interacting with network devices, that is, Ethernet | |
61 | interfaces. The netdev library is a thin layer over "netdev | |
62 | provider" code, explained further below. | |
63 | ||
64 | The other components may need attention during a port. You will | |
65 | almost certainly have to implement a "netdev provider". Depending on | |
66 | the type of port you are doing and the desired performance, you may | |
67 | also have to implement an "ofproto provider" or a lower-level | |
68 | component called a "dpif" provider. | |
bc34d060 | 69 | |
abe529af | 70 | The following sections talk about these components in more detail. |
bc34d060 | 71 | |
bc34d060 | 72 | |
abe529af BP |
73 | Writing a netdev Provider |
74 | ------------------------- | |
bc34d060 | 75 | |
abe529af BP |
76 | A "netdev provider" implements an operating system and hardware |
77 | specific interface to "network devices", e.g. eth0 on Linux. Open | |
78 | vSwitch must be able to open each port on a switch as a netdev, so you | |
79 | will need to implement a "netdev provider" that works with your switch | |
80 | hardware and software. | |
bc34d060 | 81 | |
abe529af BP |
82 | struct netdev_class, in lib/netdev-provider.h, defines the interfaces |
83 | required to implement a netdev. That structure contains many function | |
84 | pointers, each of which has a comment that is meant to describe its | |
85 | behavior in detail. If the requirements are unclear, please report | |
86 | this as a bug. | |
bc34d060 | 87 | |
abe529af | 88 | The netdev interface can be divided into a few rough categories: |
bc34d060 BP |
89 | |
90 | * Functions required to properly implement OpenFlow features. For | |
91 | example, OpenFlow requires the ability to report the Ethernet | |
92 | hardware address of a port. These functions must be implemented | |
93 | for minimally correct operation. | |
94 | ||
95 | * Functions required to implement optional Open vSwitch features. | |
96 | For example, the Open vSwitch support for in-band control | |
97 | requires netdev support for inspecting the TCP/IP stack's ARP | |
98 | table. These functions must be implemented if the corresponding | |
99 | OVS features are to work, but may be omitted initially. | |
100 | ||
abe529af BP |
101 | * Functions needed in some implementations but not in others. For |
102 | example, most kinds of ports (see below) do not need | |
103 | functionality to receive packets from a network device. | |
bc34d060 BP |
104 | |
105 | The existing netdev implementations may serve as useful examples | |
106 | during a port: | |
107 | ||
108 | * lib/netdev-linux.c implements netdev functionality for Linux | |
109 | network devices, using Linux kernel calls. It may be a good | |
110 | place to start for full-featured netdev implementations. | |
111 | ||
abe529af | 112 | * lib/netdev-vport.c provides support for "virtual ports" |
59348dba JP |
113 | implemented by the Open vSwitch datapath module for the Linux |
114 | kernel. This may serve as a model for minimal netdev | |
115 | implementations. | |
bc34d060 | 116 | |
abe529af BP |
117 | * lib/netdev-dummy.c is a fake netdev implementation useful only |
118 | for testing. | |
119 | ||
120 | ||
121 | Porting Strategies | |
122 | ------------------ | |
123 | ||
124 | After a netdev provider has been implemented for a system's network | |
125 | devices, you may choose among three basic porting strategies. | |
126 | ||
127 | The lowest-effort strategy is to use the "userspace switch" | |
128 | implementation built into Open vSwitch. This ought to work, without | |
129 | writing any more code, as long as the netdev provider that you | |
130 | implemented supports receiving packets. It yields poor performance, | |
131 | however, because every packet passes through the ovs-vswitchd process. | |
132 | See INSTALL.userspace for instructions on how to configure a userspace | |
133 | switch. | |
134 | ||
135 | If the userspace switch is not the right choice for your port, then | |
136 | you will have to write more code. You may implement either an | |
137 | "ofproto provider" or a "dpif provider". Which you should choose | |
138 | depends on a few different factors: | |
139 | ||
140 | * Only an ofproto provider can take full advantage of hardware | |
141 | with built-in support for wildcards (e.g. an ACL table or a | |
142 | TCAM). | |
143 | ||
144 | * A dpif provider can take advantage of the Open vSwitch built-in | |
145 | implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and | |
146 | other features. An ofproto provider has to provide its own | |
147 | implementations, if the hardware can support them at all. | |
148 | ||
b69fe6b1 JP |
149 | * A dpif provider is usually easier to implement, but most |
150 | appropriate for software switching. It "explodes" wildcard | |
d445cc16 JP |
151 | rules into exact-match entries (with an optional wildcard mask). |
152 | This allows fast hash lookups in software, but makes | |
153 | inefficient use of TCAMs in hardware that support wildcarding. | |
abe529af BP |
154 | |
155 | The following sections describe how to implement each kind of port. | |
156 | ||
157 | ||
158 | ofproto Providers | |
159 | ----------------- | |
160 | ||
161 | An "ofproto provider" is what ofproto uses to directly monitor and | |
162 | control an OpenFlow-capable switch. struct ofproto_class, in | |
5bee6e26 JP |
163 | ofproto/ofproto-provider.h, defines the interfaces to implement an |
164 | ofproto provider for new hardware or software. That structure contains | |
165 | many function pointers, each of which has a comment that is meant to | |
abe529af BP |
166 | describe its behavior in detail. If the requirements are unclear, |
167 | please report this as a bug. | |
168 | ||
169 | The ofproto provider interface is preliminary. Please let us know if | |
170 | it seems unsuitable for your purpose. We will try to improve it. | |
171 | ||
172 | ||
173 | Writing a dpif Provider | |
174 | ----------------------- | |
175 | ||
176 | Open vSwitch has a built-in ofproto provider named "ofproto-dpif", | |
177 | which is built on top of a library for manipulating datapaths, called | |
d445cc16 JP |
178 | "dpif". A "datapath" is a simple flow table, one that is only required |
179 | to support exact-match flows, that is, flows without wildcards. When a | |
180 | packet arrives on a network device, the datapath looks for it in this | |
181 | table. If there is a match, then it performs the associated actions. | |
182 | If there is no match, the datapath passes the packet up to ofproto-dpif, | |
183 | which maintains the full OpenFlow flow table. If the packet matches in | |
184 | this flow table, then ofproto-dpif executes its actions and inserts a | |
185 | new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the | |
abe529af BP |
186 | packet up to ofproto to send the packet to the OpenFlow controller, if |
187 | one is configured.) | |
188 | ||
d445cc16 JP |
189 | When calculating the dpif flow, ofproto-dpif generates an exact-match |
190 | flow that describes the missed packet. It makes an effort to figure out | |
191 | what fields can be wildcarded based on the switch's configuration and | |
192 | OpenFlow flow table. The dpif is free to ignore the suggested wildcards | |
193 | and only support the exact-match entry. However, if the dpif supports | |
194 | wildcarding, then it can use the masks to match multiple flows with | |
195 | fewer entries and potentially significantly reduce the number of flow | |
196 | misses handled by ofproto-dpif. | |
197 | ||
abe529af BP |
198 | The "dpif" library in turn delegates much of its functionality to a |
199 | "dpif provider". The following diagram shows how dpif providers fit | |
200 | into the Open vSwitch architecture: | |
201 | ||
202 | _ | |
203 | | +-------------------+ | |
204 | | | ovs-vswitchd |<-->ovsdb-server | |
205 | | +-------------------+ | |
206 | | | ofproto |<-->OpenFlow controllers | |
a4ae54e1 BP |
207 | | +--------+-+--------+ _ |
208 | | | netdev | |ofproto-| | | |
209 | userspace | +--------+ | dpif | | | |
210 | | | netdev | +--------+ | | |
211 | | |provider| | dpif | | | |
4b3b481c | 212 | | +---||---+ +--------+ | |
a4ae54e1 BP |
213 | | || | dpif | | implementation of |
214 | | || |provider| | ofproto provider | |
215 | |_ || +---||---+ | | |
216 | || || | | |
217 | _ +---||-----+---||---+ | | |
218 | | | |datapath| | | |
219 | kernel | | +--------+ _| | |
abe529af BP |
220 | | | | |
221 | |_ +--------||---------+ | |
222 | || | |
223 | physical | |
224 | NIC | |
225 | ||
226 | struct dpif_class, in lib/dpif-provider.h, defines the interfaces | |
227 | required to implement a dpif provider for new hardware or software. | |
228 | That structure contains many function pointers, each of which has a | |
229 | comment that is meant to describe its behavior in detail. If the | |
230 | requirements are unclear, please report this as a bug. | |
231 | ||
232 | There are two existing dpif implementations that may serve as | |
233 | useful examples during a port: | |
234 | ||
93451a0a | 235 | * lib/dpif-netlink.c is a Linux-specific dpif implementation that |
abe529af BP |
236 | talks to an Open vSwitch-specific kernel module (whose sources |
237 | are in the "datapath" directory). The kernel module performs | |
238 | all of the switching work, passing packets that do not match any | |
239 | flow table entry up to userspace. This dpif implementation is | |
240 | essentially a wrapper around calls into the kernel module. | |
241 | ||
242 | * lib/dpif-netdev.c is a generic dpif implementation that performs | |
243 | all switching internally. This is how the Open vSwitch | |
244 | userspace switch is implemented. | |
245 | ||
246 | ||
6e8e271c BP |
247 | Miscellaneous Notes |
248 | ------------------- | |
249 | ||
da40ecac BP |
250 | Open vSwitch source code uses uint16_t, uint32_t, and uint64_t as |
251 | fixed-width types in host byte order, and ovs_be16, ovs_be32, and | |
252 | ovs_be64 as fixed-width types in network byte order. Each of the | |
253 | latter is equivalent to the one of the former, but the difference in | |
254 | name makes the intended use obvious. | |
255 | ||
4910bbc6 BP |
256 | The default "fail-mode" for Open vSwitch bridges is "standalone", |
257 | meaning that, when the OpenFlow controllers cannot be contacted, Open | |
258 | vSwitch acts as a regular MAC-learning switch. This works well in | |
259 | virtualization environments where there is normally just one uplink | |
260 | (either a single physical interface or a bond). In a more general | |
261 | environment, it can create loops. So, if you are porting to a | |
262 | general-purpose switch platform, you should consider changing the | |
263 | default "fail-mode" to "secure", which does not behave this way. See | |
264 | documentation for the "fail-mode" column in the Bridge table in | |
265 | ovs-vswitchd.conf.db(5) for more information. | |
266 | ||
e251c8d0 BP |
267 | lib/entropy.c assumes that it can obtain high-quality random number |
268 | seeds at startup by reading from /dev/urandom. You will need to | |
269 | modify it if this is not true on your platform. | |
6e8e271c | 270 | |
ce887677 BP |
271 | vswitchd/system-stats.c only knows how to obtain some statistics on |
272 | Linux. Optionally you may implement them for your platform as well. | |
273 | ||
abe529af | 274 | |
7f165675 BP |
275 | Why OVS Does Not Support Hybrid Providers |
276 | ----------------------------------------- | |
277 | ||
278 | The "Porting Strategies" section above describes the "ofproto | |
279 | provider" and "dpif provider" porting strategies. Only an ofproto | |
280 | provider can take advantage of hardware TCAM support, and only a dpif | |
281 | provider can take advantage of the OVS built-in implementations of | |
282 | various features. It is therefore tempting to suggest a hybrid | |
283 | approach that shares the advantages of both strategies. | |
284 | ||
285 | However, Open vSwitch does not support a hybrid approach. Doing so | |
286 | may be possible, with a significant amount of extra development work, | |
287 | but it does not yet seem worthwhile, for the reasons explained below. | |
288 | ||
289 | First, user surprise is likely when a switch supports a feature only | |
290 | with a high performance penalty. For example, one user questioned why | |
291 | adding a particular OpenFlow action to a flow caused a 1,058x slowdown | |
292 | on a hardware OpenFlow implementation [1]. The action required the | |
293 | flow to be implemented in software. | |
294 | ||
295 | Given that implementing a flow in software on the slow management CPU | |
296 | of a hardware switch causes a major slowdown, software-implemented | |
297 | flows would only make sense for very low-volume traffic. But many of | |
298 | the features built into the OVS software switch implementation would | |
299 | need to apply to every flow to be useful. There is no value, for | |
300 | example, in applying bonding or 802.1Q VLAN support only to low-volume | |
301 | traffic. | |
302 | ||
303 | Besides supporting features of OpenFlow actions, a hybrid approach | |
304 | could also support forms of matching not supported by particular | |
305 | switching hardware, by sending all packets that might match a rule to | |
306 | software. But again this can cause an unacceptable slowdown by | |
307 | forcing bulk traffic through software in the hardware switch's slow | |
308 | management CPU. Consider, for example, a hardware switch that can | |
309 | match on the IPv6 Ethernet type but not on fields in IPv6 headers. An | |
310 | OpenFlow table that matched on the IPv6 Ethernet type would perform | |
311 | well, but adding a rule that matched only UDPv6 would force every IPv6 | |
312 | packet to software, slowing down not just UDPv6 but all IPv6 | |
313 | processing. | |
314 | ||
315 | [1] Aaron Rosen, "Modify packet fields extremely slow", | |
316 | openflow-discuss mailing list, June 26, 2011, archived at | |
317 | https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html. | |
318 | ||
319 | ||
bc34d060 BP |
320 | Questions |
321 | --------- | |
322 | ||
323 | Please direct porting questions to dev@openvswitch.org. We will try | |
324 | to use questions to improve this porting guide. |