]> git.proxmox.com Git - ovs.git/blame - PORTING.md
netdev-dpdk: Add vhost enqueue retries.
[ovs.git] / PORTING.md
CommitLineData
542cc9bb
TG
1How to Port Open vSwitch to New Software or Hardware
2====================================================
bc34d060
BP
3
4Open vSwitch (OVS) is intended to be easily ported to new software and
5hardware platforms. This document describes the types of changes that
6are most likely to be necessary in porting OVS to Unix-like platforms.
7(Porting OVS to other kinds of platforms is likely to be more
8difficult.)
9
abe529af 10
fa066f01
BP
11Vocabulary
12----------
13
14For historical reasons, different words are used for essentially the
15same concept in different areas of the Open vSwitch source tree. Here
16is a concordance, indexed by the area of the source tree:
17
18 datapath/ vport ---
19 vswitchd/ iface port
20 ofproto/ port bundle
9a54394a 21 ofproto/bond.c slave bond
fa066f01
BP
22 lib/lacp.c slave lacp
23 lib/netdev.c netdev ---
24 database Interface Port
25
26
bc34d060
BP
27Open vSwitch Architectural Overview
28-----------------------------------
29
abe529af 30The following diagram shows the very high-level architecture of Open
bc34d060 31vSwitch from a porter's perspective.
bc34d060 32
abe529af
BP
33 +-------------------+
34 | ovs-vswitchd |<-->ovsdb-server
35 +-------------------+
36 | ofproto |<-->OpenFlow controllers
37 +--------+-+--------+
38 | netdev | | ofproto|
39 +--------+ |provider|
40 | netdev | +--------+
41 |provider|
42 +--------+
43
44Some of the components are generic. Modulo bugs or inadequacies,
45these components should not need to be modified as part of a port:
46
542cc9bb
TG
47 - "ovs-vswitchd" is the main Open vSwitch userspace program, in
48 vswitchd/. It reads the desired Open vSwitch configuration from
49 the ovsdb-server program over an IPC channel and passes this
50 configuration down to the "ofproto" library. It also passes
51 certain status and statistical information from ofproto back
52 into the database.
abe529af 53
542cc9bb
TG
54 - "ofproto" is the Open vSwitch library, in ofproto/, that
55 implements an OpenFlow switch. It talks to OpenFlow controllers
56 over the network and to switch hardware or software through an
57 "ofproto provider", explained further below.
abe529af 58
542cc9bb
TG
59 - "netdev" is the Open vSwitch library, in lib/netdev.c, that
60 abstracts interacting with network devices, that is, Ethernet
61 interfaces. The netdev library is a thin layer over "netdev
62 provider" code, explained further below.
abe529af
BP
63
64The other components may need attention during a port. You will
65almost certainly have to implement a "netdev provider". Depending on
66the type of port you are doing and the desired performance, you may
67also have to implement an "ofproto provider" or a lower-level
68component called a "dpif" provider.
bc34d060 69
abe529af 70The following sections talk about these components in more detail.
bc34d060 71
bc34d060 72
abe529af
BP
73Writing a netdev Provider
74-------------------------
bc34d060 75
abe529af
BP
76A "netdev provider" implements an operating system and hardware
77specific interface to "network devices", e.g. eth0 on Linux. Open
78vSwitch must be able to open each port on a switch as a netdev, so you
79will need to implement a "netdev provider" that works with your switch
80hardware and software.
bc34d060 81
abe529af
BP
82struct netdev_class, in lib/netdev-provider.h, defines the interfaces
83required to implement a netdev. That structure contains many function
84pointers, each of which has a comment that is meant to describe its
85behavior in detail. If the requirements are unclear, please report
86this as a bug.
bc34d060 87
abe529af 88The netdev interface can be divided into a few rough categories:
bc34d060 89
542cc9bb
TG
90 * Functions required to properly implement OpenFlow features. For
91 example, OpenFlow requires the ability to report the Ethernet
92 hardware address of a port. These functions must be implemented
93 for minimally correct operation.
bc34d060 94
542cc9bb
TG
95 * Functions required to implement optional Open vSwitch features.
96 For example, the Open vSwitch support for in-band control
97 requires netdev support for inspecting the TCP/IP stack's ARP
98 table. These functions must be implemented if the corresponding
99 OVS features are to work, but may be omitted initially.
bc34d060 100
542cc9bb
TG
101 * Functions needed in some implementations but not in others. For
102 example, most kinds of ports (see below) do not need
103 functionality to receive packets from a network device.
bc34d060
BP
104
105The existing netdev implementations may serve as useful examples
106during a port:
107
542cc9bb
TG
108 * lib/netdev-linux.c implements netdev functionality for Linux
109 network devices, using Linux kernel calls. It may be a good
110 place to start for full-featured netdev implementations.
bc34d060 111
542cc9bb
TG
112 * lib/netdev-vport.c provides support for "virtual ports"
113 implemented by the Open vSwitch datapath module for the Linux
114 kernel. This may serve as a model for minimal netdev
115 implementations.
bc34d060 116
542cc9bb
TG
117 * lib/netdev-dummy.c is a fake netdev implementation useful only
118 for testing.
abe529af
BP
119
120
121Porting Strategies
122------------------
123
124After a netdev provider has been implemented for a system's network
125devices, you may choose among three basic porting strategies.
126
127The lowest-effort strategy is to use the "userspace switch"
128implementation built into Open vSwitch. This ought to work, without
129writing any more code, as long as the netdev provider that you
130implemented supports receiving packets. It yields poor performance,
131however, because every packet passes through the ovs-vswitchd process.
9feb1017
TG
132See [INSTALL.userspace.md] for instructions on how to configure a
133userspace switch.
abe529af
BP
134
135If the userspace switch is not the right choice for your port, then
136you will have to write more code. You may implement either an
137"ofproto provider" or a "dpif provider". Which you should choose
138depends on a few different factors:
139
542cc9bb
TG
140 * Only an ofproto provider can take full advantage of hardware
141 with built-in support for wildcards (e.g. an ACL table or a
142 TCAM).
abe529af 143
542cc9bb
TG
144 * A dpif provider can take advantage of the Open vSwitch built-in
145 implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and
146 other features. An ofproto provider has to provide its own
147 implementations, if the hardware can support them at all.
abe529af 148
542cc9bb
TG
149 * A dpif provider is usually easier to implement, but most
150 appropriate for software switching. It "explodes" wildcard
151 rules into exact-match entries (with an optional wildcard mask).
152 This allows fast hash lookups in software, but makes
153 inefficient use of TCAMs in hardware that support wildcarding.
abe529af
BP
154
155The following sections describe how to implement each kind of port.
156
157
158ofproto Providers
159-----------------
160
161An "ofproto provider" is what ofproto uses to directly monitor and
162control an OpenFlow-capable switch. struct ofproto_class, in
5bee6e26
JP
163ofproto/ofproto-provider.h, defines the interfaces to implement an
164ofproto provider for new hardware or software. That structure contains
165many function pointers, each of which has a comment that is meant to
abe529af
BP
166describe its behavior in detail. If the requirements are unclear,
167please report this as a bug.
168
169The ofproto provider interface is preliminary. Please let us know if
170it seems unsuitable for your purpose. We will try to improve it.
171
172
173Writing a dpif Provider
174-----------------------
175
176Open vSwitch has a built-in ofproto provider named "ofproto-dpif",
177which is built on top of a library for manipulating datapaths, called
d445cc16
JP
178"dpif". A "datapath" is a simple flow table, one that is only required
179to support exact-match flows, that is, flows without wildcards. When a
180packet arrives on a network device, the datapath looks for it in this
181table. If there is a match, then it performs the associated actions.
182If there is no match, the datapath passes the packet up to ofproto-dpif,
183which maintains the full OpenFlow flow table. If the packet matches in
184this flow table, then ofproto-dpif executes its actions and inserts a
185new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the
abe529af
BP
186packet up to ofproto to send the packet to the OpenFlow controller, if
187one is configured.)
188
d445cc16
JP
189When calculating the dpif flow, ofproto-dpif generates an exact-match
190flow that describes the missed packet. It makes an effort to figure out
191what fields can be wildcarded based on the switch's configuration and
192OpenFlow flow table. The dpif is free to ignore the suggested wildcards
193and only support the exact-match entry. However, if the dpif supports
194wildcarding, then it can use the masks to match multiple flows with
195fewer entries and potentially significantly reduce the number of flow
196misses handled by ofproto-dpif.
197
abe529af
BP
198The "dpif" library in turn delegates much of its functionality to a
199"dpif provider". The following diagram shows how dpif providers fit
200into the Open vSwitch architecture:
201
202 _
203 | +-------------------+
204 | | ovs-vswitchd |<-->ovsdb-server
205 | +-------------------+
206 | | ofproto |<-->OpenFlow controllers
a4ae54e1
BP
207 | +--------+-+--------+ _
208 | | netdev | |ofproto-| |
209 userspace | +--------+ | dpif | |
210 | | netdev | +--------+ |
211 | |provider| | dpif | |
4b3b481c 212 | +---||---+ +--------+ |
a4ae54e1
BP
213 | || | dpif | | implementation of
214 | || |provider| | ofproto provider
215 |_ || +---||---+ |
216 || || |
217 _ +---||-----+---||---+ |
218 | | |datapath| |
219 kernel | | +--------+ _|
abe529af
BP
220 | | |
221 |_ +--------||---------+
222 ||
223 physical
224 NIC
225
226struct dpif_class, in lib/dpif-provider.h, defines the interfaces
227required to implement a dpif provider for new hardware or software.
228That structure contains many function pointers, each of which has a
229comment that is meant to describe its behavior in detail. If the
230requirements are unclear, please report this as a bug.
231
232There are two existing dpif implementations that may serve as
233useful examples during a port:
234
542cc9bb
TG
235 * lib/dpif-netlink.c is a Linux-specific dpif implementation that
236 talks to an Open vSwitch-specific kernel module (whose sources
237 are in the "datapath" directory). The kernel module performs
238 all of the switching work, passing packets that do not match any
239 flow table entry up to userspace. This dpif implementation is
240 essentially a wrapper around calls into the kernel module.
abe529af 241
542cc9bb
TG
242 * lib/dpif-netdev.c is a generic dpif implementation that performs
243 all switching internally. This is how the Open vSwitch
244 userspace switch is implemented.
abe529af
BP
245
246
6e8e271c
BP
247Miscellaneous Notes
248-------------------
249
da40ecac
BP
250Open vSwitch source code uses uint16_t, uint32_t, and uint64_t as
251fixed-width types in host byte order, and ovs_be16, ovs_be32, and
252ovs_be64 as fixed-width types in network byte order. Each of the
253latter is equivalent to the one of the former, but the difference in
254name makes the intended use obvious.
255
4910bbc6
BP
256The default "fail-mode" for Open vSwitch bridges is "standalone",
257meaning that, when the OpenFlow controllers cannot be contacted, Open
258vSwitch acts as a regular MAC-learning switch. This works well in
259virtualization environments where there is normally just one uplink
260(either a single physical interface or a bond). In a more general
261environment, it can create loops. So, if you are porting to a
262general-purpose switch platform, you should consider changing the
263default "fail-mode" to "secure", which does not behave this way. See
264documentation for the "fail-mode" column in the Bridge table in
265ovs-vswitchd.conf.db(5) for more information.
266
e251c8d0
BP
267lib/entropy.c assumes that it can obtain high-quality random number
268seeds at startup by reading from /dev/urandom. You will need to
269modify it if this is not true on your platform.
6e8e271c 270
ce887677
BP
271vswitchd/system-stats.c only knows how to obtain some statistics on
272Linux. Optionally you may implement them for your platform as well.
273
abe529af 274
7f165675
BP
275Why OVS Does Not Support Hybrid Providers
276-----------------------------------------
277
278The "Porting Strategies" section above describes the "ofproto
279provider" and "dpif provider" porting strategies. Only an ofproto
280provider can take advantage of hardware TCAM support, and only a dpif
281provider can take advantage of the OVS built-in implementations of
282various features. It is therefore tempting to suggest a hybrid
283approach that shares the advantages of both strategies.
284
285However, Open vSwitch does not support a hybrid approach. Doing so
286may be possible, with a significant amount of extra development work,
287but it does not yet seem worthwhile, for the reasons explained below.
288
289First, user surprise is likely when a switch supports a feature only
290with a high performance penalty. For example, one user questioned why
291adding a particular OpenFlow action to a flow caused a 1,058x slowdown
292on a hardware OpenFlow implementation [1]. The action required the
293flow to be implemented in software.
294
295Given that implementing a flow in software on the slow management CPU
296of a hardware switch causes a major slowdown, software-implemented
297flows would only make sense for very low-volume traffic. But many of
298the features built into the OVS software switch implementation would
299need to apply to every flow to be useful. There is no value, for
300example, in applying bonding or 802.1Q VLAN support only to low-volume
301traffic.
302
303Besides supporting features of OpenFlow actions, a hybrid approach
304could also support forms of matching not supported by particular
305switching hardware, by sending all packets that might match a rule to
306software. But again this can cause an unacceptable slowdown by
307forcing bulk traffic through software in the hardware switch's slow
308management CPU. Consider, for example, a hardware switch that can
309match on the IPv6 Ethernet type but not on fields in IPv6 headers. An
310OpenFlow table that matched on the IPv6 Ethernet type would perform
311well, but adding a rule that matched only UDPv6 would force every IPv6
312packet to software, slowing down not just UDPv6 but all IPv6
313processing.
314
315[1] Aaron Rosen, "Modify packet fields extremely slow",
316 openflow-discuss mailing list, June 26, 2011, archived at
317 https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html.
318
319
bc34d060
BP
320Questions
321---------
322
323Please direct porting questions to dev@openvswitch.org. We will try
324to use questions to improve this porting guide.
9feb1017
TG
325
326[INSTALL.userspace.md]:INSTALL.userspace.md