]>
Commit | Line | Data |
---|---|---|
1 | How to Port Open vSwitch to New Software or Hardware | |
2 | ==================================================== | |
3 | ||
4 | Open vSwitch (OVS) is intended to be easily ported to new software and | |
5 | hardware platforms. This document describes the types of changes that | |
6 | are most likely to be necessary in porting OVS to Unix-like platforms. | |
7 | (Porting OVS to other kinds of platforms is likely to be more | |
8 | difficult.) | |
9 | ||
10 | ||
11 | Vocabulary | |
12 | ---------- | |
13 | ||
14 | For historical reasons, different words are used for essentially the | |
15 | same concept in different areas of the Open vSwitch source tree. Here | |
16 | is a concordance, indexed by the area of the source tree: | |
17 | ||
18 | datapath/ vport --- | |
19 | vswitchd/ iface port | |
20 | ofproto/ port bundle | |
21 | lib/bond.c slave bond | |
22 | lib/lacp.c slave lacp | |
23 | lib/netdev.c netdev --- | |
24 | database Interface Port | |
25 | ||
26 | ||
27 | Open vSwitch Architectural Overview | |
28 | ----------------------------------- | |
29 | ||
30 | The following diagram shows the very high-level architecture of Open | |
31 | vSwitch from a porter's perspective. | |
32 | ||
33 | +-------------------+ | |
34 | | ovs-vswitchd |<-->ovsdb-server | |
35 | +-------------------+ | |
36 | | ofproto |<-->OpenFlow controllers | |
37 | +--------+-+--------+ | |
38 | | netdev | | ofproto| | |
39 | +--------+ |provider| | |
40 | | netdev | +--------+ | |
41 | |provider| | |
42 | +--------+ | |
43 | ||
44 | Some of the components are generic. Modulo bugs or inadequacies, | |
45 | these components should not need to be modified as part of a port: | |
46 | ||
47 | - "ovs-vswitchd" is the main Open vSwitch userspace program, in | |
48 | vswitchd/. It reads the desired Open vSwitch configuration from | |
49 | the ovsdb-server program over an IPC channel and passes this | |
50 | configuration down to the "ofproto" library. It also passes | |
51 | certain status and statistical information from ofproto back | |
52 | into the database. | |
53 | ||
54 | - "ofproto" is the Open vSwitch library, in ofproto/, that | |
55 | implements an OpenFlow switch. It talks to OpenFlow controllers | |
56 | over the network and to switch hardware or software through an | |
57 | "ofproto provider", explained further below. | |
58 | ||
59 | - "netdev" is the Open vSwitch library, in lib/netdev.c, that | |
60 | abstracts interacting with network devices, that is, Ethernet | |
61 | interfaces. The netdev library is a thin layer over "netdev | |
62 | provider" code, explained further below. | |
63 | ||
64 | The other components may need attention during a port. You will | |
65 | almost certainly have to implement a "netdev provider". Depending on | |
66 | the type of port you are doing and the desired performance, you may | |
67 | also have to implement an "ofproto provider" or a lower-level | |
68 | component called a "dpif" provider. | |
69 | ||
70 | The following sections talk about these components in more detail. | |
71 | ||
72 | ||
73 | Writing a netdev Provider | |
74 | ------------------------- | |
75 | ||
76 | A "netdev provider" implements an operating system and hardware | |
77 | specific interface to "network devices", e.g. eth0 on Linux. Open | |
78 | vSwitch must be able to open each port on a switch as a netdev, so you | |
79 | will need to implement a "netdev provider" that works with your switch | |
80 | hardware and software. | |
81 | ||
82 | struct netdev_class, in lib/netdev-provider.h, defines the interfaces | |
83 | required to implement a netdev. That structure contains many function | |
84 | pointers, each of which has a comment that is meant to describe its | |
85 | behavior in detail. If the requirements are unclear, please report | |
86 | this as a bug. | |
87 | ||
88 | The netdev interface can be divided into a few rough categories: | |
89 | ||
90 | * Functions required to properly implement OpenFlow features. For | |
91 | example, OpenFlow requires the ability to report the Ethernet | |
92 | hardware address of a port. These functions must be implemented | |
93 | for minimally correct operation. | |
94 | ||
95 | * Functions required to implement optional Open vSwitch features. | |
96 | For example, the Open vSwitch support for in-band control | |
97 | requires netdev support for inspecting the TCP/IP stack's ARP | |
98 | table. These functions must be implemented if the corresponding | |
99 | OVS features are to work, but may be omitted initially. | |
100 | ||
101 | * Functions needed in some implementations but not in others. For | |
102 | example, most kinds of ports (see below) do not need | |
103 | functionality to receive packets from a network device. | |
104 | ||
105 | The existing netdev implementations may serve as useful examples | |
106 | during a port: | |
107 | ||
108 | * lib/netdev-linux.c implements netdev functionality for Linux | |
109 | network devices, using Linux kernel calls. It may be a good | |
110 | place to start for full-featured netdev implementations. | |
111 | ||
112 | * lib/netdev-vport.c provides support for "virtual ports" | |
113 | implemented by the Open vSwitch datapath module for the Linux | |
114 | kernel. This may serve as a model for minimal netdev | |
115 | implementations. | |
116 | ||
117 | * lib/netdev-dummy.c is a fake netdev implementation useful only | |
118 | for testing. | |
119 | ||
120 | ||
121 | Porting Strategies | |
122 | ------------------ | |
123 | ||
124 | After a netdev provider has been implemented for a system's network | |
125 | devices, you may choose among three basic porting strategies. | |
126 | ||
127 | The lowest-effort strategy is to use the "userspace switch" | |
128 | implementation built into Open vSwitch. This ought to work, without | |
129 | writing any more code, as long as the netdev provider that you | |
130 | implemented supports receiving packets. It yields poor performance, | |
131 | however, because every packet passes through the ovs-vswitchd process. | |
132 | See INSTALL.userspace for instructions on how to configure a userspace | |
133 | switch. | |
134 | ||
135 | If the userspace switch is not the right choice for your port, then | |
136 | you will have to write more code. You may implement either an | |
137 | "ofproto provider" or a "dpif provider". Which you should choose | |
138 | depends on a few different factors: | |
139 | ||
140 | * Only an ofproto provider can take full advantage of hardware | |
141 | with built-in support for wildcards (e.g. an ACL table or a | |
142 | TCAM). | |
143 | ||
144 | * A dpif provider can take advantage of the Open vSwitch built-in | |
145 | implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and | |
146 | other features. An ofproto provider has to provide its own | |
147 | implementations, if the hardware can support them at all. | |
148 | ||
149 | * A dpif provider is usually easier to implement, but most | |
150 | appropriate for software switching. It "explodes" wildcard | |
151 | rules into exact-match entries. This allows fast hash lookups | |
152 | in software, but makes inefficient use of TCAMs in hardware | |
153 | that support wildcarding. | |
154 | ||
155 | The following sections describe how to implement each kind of port. | |
156 | ||
157 | ||
158 | ofproto Providers | |
159 | ----------------- | |
160 | ||
161 | An "ofproto provider" is what ofproto uses to directly monitor and | |
162 | control an OpenFlow-capable switch. struct ofproto_class, in | |
163 | ofproto/ofproto-provider.h, defines the interfaces to implement an | |
164 | ofproto provider for new hardware or software. That structure contains | |
165 | many function pointers, each of which has a comment that is meant to | |
166 | describe its behavior in detail. If the requirements are unclear, | |
167 | please report this as a bug. | |
168 | ||
169 | The ofproto provider interface is preliminary. Please let us know if | |
170 | it seems unsuitable for your purpose. We will try to improve it. | |
171 | ||
172 | ||
173 | Writing a dpif Provider | |
174 | ----------------------- | |
175 | ||
176 | Open vSwitch has a built-in ofproto provider named "ofproto-dpif", | |
177 | which is built on top of a library for manipulating datapaths, called | |
178 | "dpif". A "datapath" is a simple flow table, one that supports only | |
179 | exact-match flows, that is, flows without wildcards. When a packet | |
180 | arrives on a network device, the datapath looks for it in this | |
181 | exact-match table. If there is a match, then it performs the | |
182 | associated actions. If there is no match, the datapath passes the | |
183 | packet up to ofproto-dpif, which maintains an OpenFlow flow table | |
184 | (that supports wildcards). If the packet matches in this flow table, | |
185 | then ofproto-dpif executes its actions and inserts a new exact-match | |
186 | entry into the dpif flow table. (Otherwise, ofproto-dpif passes the | |
187 | packet up to ofproto to send the packet to the OpenFlow controller, if | |
188 | one is configured.) | |
189 | ||
190 | The "dpif" library in turn delegates much of its functionality to a | |
191 | "dpif provider". The following diagram shows how dpif providers fit | |
192 | into the Open vSwitch architecture: | |
193 | ||
194 | _ | |
195 | | +-------------------+ | |
196 | | | ovs-vswitchd |<-->ovsdb-server | |
197 | | +-------------------+ | |
198 | | | ofproto |<-->OpenFlow controllers | |
199 | | +--------+-+--------+ _ | |
200 | | | netdev | |ofproto-| | | |
201 | userspace | +--------+ | dpif | | | |
202 | | | netdev | +--------+ | | |
203 | | |provider| | dpif | | | |
204 | | +---||---+ +--------+ | | |
205 | | || | dpif | | implementation of | |
206 | | || |provider| | ofproto provider | |
207 | |_ || +---||---+ | | |
208 | || || | | |
209 | _ +---||-----+---||---+ | | |
210 | | | |datapath| | | |
211 | kernel | | +--------+ _| | |
212 | | | | | |
213 | |_ +--------||---------+ | |
214 | || | |
215 | physical | |
216 | NIC | |
217 | ||
218 | struct dpif_class, in lib/dpif-provider.h, defines the interfaces | |
219 | required to implement a dpif provider for new hardware or software. | |
220 | That structure contains many function pointers, each of which has a | |
221 | comment that is meant to describe its behavior in detail. If the | |
222 | requirements are unclear, please report this as a bug. | |
223 | ||
224 | There are two existing dpif implementations that may serve as | |
225 | useful examples during a port: | |
226 | ||
227 | * lib/dpif-linux.c is a Linux-specific dpif implementation that | |
228 | talks to an Open vSwitch-specific kernel module (whose sources | |
229 | are in the "datapath" directory). The kernel module performs | |
230 | all of the switching work, passing packets that do not match any | |
231 | flow table entry up to userspace. This dpif implementation is | |
232 | essentially a wrapper around calls into the kernel module. | |
233 | ||
234 | * lib/dpif-netdev.c is a generic dpif implementation that performs | |
235 | all switching internally. This is how the Open vSwitch | |
236 | userspace switch is implemented. | |
237 | ||
238 | ||
239 | Miscellaneous Notes | |
240 | ------------------- | |
241 | ||
242 | Open vSwitch source code uses uint16_t, uint32_t, and uint64_t as | |
243 | fixed-width types in host byte order, and ovs_be16, ovs_be32, and | |
244 | ovs_be64 as fixed-width types in network byte order. Each of the | |
245 | latter is equivalent to the one of the former, but the difference in | |
246 | name makes the intended use obvious. | |
247 | ||
248 | lib/entropy.c assumes that it can obtain high-quality random number | |
249 | seeds at startup by reading from /dev/urandom. You will need to | |
250 | modify it if this is not true on your platform. | |
251 | ||
252 | vswitchd/system-stats.c only knows how to obtain some statistics on | |
253 | Linux. Optionally you may implement them for your platform as well. | |
254 | ||
255 | ||
256 | Why OVS Does Not Support Hybrid Providers | |
257 | ----------------------------------------- | |
258 | ||
259 | The "Porting Strategies" section above describes the "ofproto | |
260 | provider" and "dpif provider" porting strategies. Only an ofproto | |
261 | provider can take advantage of hardware TCAM support, and only a dpif | |
262 | provider can take advantage of the OVS built-in implementations of | |
263 | various features. It is therefore tempting to suggest a hybrid | |
264 | approach that shares the advantages of both strategies. | |
265 | ||
266 | However, Open vSwitch does not support a hybrid approach. Doing so | |
267 | may be possible, with a significant amount of extra development work, | |
268 | but it does not yet seem worthwhile, for the reasons explained below. | |
269 | ||
270 | First, user surprise is likely when a switch supports a feature only | |
271 | with a high performance penalty. For example, one user questioned why | |
272 | adding a particular OpenFlow action to a flow caused a 1,058x slowdown | |
273 | on a hardware OpenFlow implementation [1]. The action required the | |
274 | flow to be implemented in software. | |
275 | ||
276 | Given that implementing a flow in software on the slow management CPU | |
277 | of a hardware switch causes a major slowdown, software-implemented | |
278 | flows would only make sense for very low-volume traffic. But many of | |
279 | the features built into the OVS software switch implementation would | |
280 | need to apply to every flow to be useful. There is no value, for | |
281 | example, in applying bonding or 802.1Q VLAN support only to low-volume | |
282 | traffic. | |
283 | ||
284 | Besides supporting features of OpenFlow actions, a hybrid approach | |
285 | could also support forms of matching not supported by particular | |
286 | switching hardware, by sending all packets that might match a rule to | |
287 | software. But again this can cause an unacceptable slowdown by | |
288 | forcing bulk traffic through software in the hardware switch's slow | |
289 | management CPU. Consider, for example, a hardware switch that can | |
290 | match on the IPv6 Ethernet type but not on fields in IPv6 headers. An | |
291 | OpenFlow table that matched on the IPv6 Ethernet type would perform | |
292 | well, but adding a rule that matched only UDPv6 would force every IPv6 | |
293 | packet to software, slowing down not just UDPv6 but all IPv6 | |
294 | processing. | |
295 | ||
296 | [1] Aaron Rosen, "Modify packet fields extremely slow", | |
297 | openflow-discuss mailing list, June 26, 2011, archived at | |
298 | https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html. | |
299 | ||
300 | ||
301 | Questions | |
302 | --------- | |
303 | ||
304 | Please direct porting questions to dev@openvswitch.org. We will try | |
305 | to use questions to improve this porting guide. |