]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | * Flow match expression handling library. |
2 | ||
3 | ovn-controller is the primary user of flow match expressions, but | |
4 | the same syntax and I imagine the same code ought to be useful in | |
5 | ovn-nbd for ACL match expressions. | |
6 | ||
7 | ** Definition of data structures to represent a match expression as a | |
8 | syntax tree. | |
9 | ||
10 | ** Definition of data structures to represent variables (fields). | |
11 | ||
12 | Fields need names and prerequisites. Most fields are numeric and | |
13 | thus need widths. We need also need a way to represent nominal | |
14 | fields (currently just logical port names). It might be | |
15 | appropriate to associate fields directly with OXM/NXM code points; | |
16 | we have to decide whether we want OVN to use the OVS flow structure | |
17 | or work with OXM more directly. | |
18 | ||
19 | Probably should be defined so that the data structure is also | |
20 | useful for references to fields in action parsing. | |
21 | ||
22 | ** Lexical analysis. | |
23 | ||
24 | Probably should be defined so that the lexer can be reused for | |
25 | parsing actions. | |
26 | ||
27 | ** Parsing into syntax tree. | |
28 | ||
29 | ** Semantic checking against variable definitions. | |
30 | ||
31 | ** Applying prerequisites. | |
32 | ||
33 | ** Simplification into conjunction-of-disjunctions (CoD) form. | |
34 | ||
35 | ** Transformation from CoD form into OXM matches. | |
36 | ||
37 | * ovn-controller | |
38 | ||
39 | ** Flow table handling in ovn-controller. | |
40 | ||
41 | ovn-controller has to transform logical datapath flows from the | |
42 | database into OpenFlow flows. | |
43 | ||
44 | *** Definition (or choice) of data structure for flows and flow table. | |
45 | ||
46 | It would be natural enough to use "struct flow" and "struct | |
47 | classifier" for this. Maybe that is what we should do. However, | |
48 | "struct classifier" is optimized for searches based on packet | |
49 | headers, whereas all we care about here can be implemented with a | |
50 | hash table. Also, we may want to make it easy to add and remove | |
51 | support for fields without recompiling, which is not possible with | |
52 | "struct flow" or "struct classifier". | |
53 | ||
54 | On the other hand, we may find that it is difficult to decide that | |
55 | two OXM flow matches are identical (to normalize them) without a | |
56 | lot of domain-specific knowledge that is already embedded in struct | |
57 | flow. It's also going to be a pain to come up with a way to make | |
58 | anything other than "struct flow" work with the ofputil_*() | |
59 | functions for encoding and decoding OpenFlow. | |
60 | ||
61 | It's also possible we could use struct flow without struct | |
62 | classifier. | |
63 | ||
64 | *** Assembling conjunctive flows from flow match expressions. | |
65 | ||
66 | This transformation explodes logical datapath flows into multiple | |
67 | OpenFlow flow table entries, since a flow match expression in CoD | |
68 | form requires several OpenFlow flow table entries. It also | |
69 | requires merging together OpenFlow flow tables entries that contain | |
70 | "conjunction" actions (really just concatenating their actions). | |
71 | ||
72 | *** Translating logical datapath port names into port numbers. | |
73 | ||
74 | Logical ports are specified by name in logical datapath flows, but | |
75 | OpenFlow only works in terms of numbers. | |
76 | ||
77 | *** Translating logical datapath actions into OpenFlow actions. | |
78 | ||
79 | Some of the logical datapath actions do not have natural | |
80 | representations as OpenFlow actions: they require | |
81 | packet-in/packet-out round trips through ovn-controller. The | |
82 | trickiest part of that is going to be making sure that the | |
83 | packet-out resumes the control flow that was broken off by the | |
84 | packet-in. That's tricky; we'll probably have to restrict control | |
85 | flow or add OVS features to make resuming in general possible. Not | |
86 | sure which is better at this point. | |
87 | ||
88 | *** OpenFlow flow table synchronization. | |
89 | ||
90 | The internal representation of the OpenFlow flow table has to be | |
91 | synced across the controller connection to OVS. This probably | |
92 | boils down to the "flow monitoring" feature of OF1.4 which was then | |
93 | made available as a "standard extension" to OF1.3. (OVS hasn't | |
94 | implemented this for OF1.4 yet, but the feature is based on a OVS | |
95 | extension to OF1.0, so it should be straightforward to add it.) | |
96 | ||
97 | We probably need some way to catch cases where OVS and OVN don't | |
98 | see eye-to-eye on what exactly constitutes a flow, so that OVN | |
99 | doesn't waste a lot of CPU time hammering at OVS trying to install | |
100 | something that it's not going to do. | |
101 | ||
102 | *** Logical/physical translation. | |
103 | ||
104 | When a packet comes into the integration bridge, the first stage of | |
105 | processing needs to translate it from a physical to a logical | |
106 | context. When a packet leaves the integration bridge, the final | |
107 | stage of processing needs to translate it back into a physical | |
108 | context. ovn-controller needs to populate the OpenFlow flows | |
109 | tables to do these translations. | |
110 | ||
111 | *** Determine how to split logical pipeline across physical nodes. | |
112 | ||
113 | From the original OVN architecture document: | |
114 | ||
115 | The pipeline processing is split between the ingress and egress | |
116 | transport nodes. In particular, the logical egress processing may | |
117 | occur at either hypervisor. Processing the logical egress on the | |
118 | ingress hypervisor requires more state about the egress vif's | |
119 | policies, but reduces traffic on the wire that would eventually be | |
120 | dropped. Whereas, processing on the egress hypervisor can reduce | |
121 | broadcast traffic on the wire by doing local replication. We | |
122 | initially plan to process logical egress on the egress hypervisor | |
123 | so that less state needs to be replicated. However, we may change | |
124 | this behavior once we gain some experience writing the logical | |
125 | flows. | |
126 | ||
127 | The split pipeline processing split will influence how tunnel keys | |
128 | are encoded. | |
129 | ||
130 | ** Interaction with Open_vSwitch and OVN databases: | |
131 | ||
132 | *** Monitor VIFs attached to the integration bridge in Open_vSwitch. | |
133 | ||
134 | In response to changes, add or remove corresponding rows in | |
135 | Bindings table in OVN. | |
136 | ||
137 | *** Populate Chassis row in OVN at startup. Maintain Chassis row over time. | |
138 | ||
139 | (Warn if any other Chassis claims the same IP address.) | |
140 | ||
141 | *** Remove Chassis and Bindings rows from OVN on exit. | |
142 | ||
143 | *** Monitor Chassis table in OVN. | |
144 | ||
145 | Populate Port records for tunnels to other chassis into | |
146 | Open_vSwitch database. As a scale optimization later on, one can | |
147 | populate only records for tunnels to other chassis that have | |
148 | logical networks in common with this one. | |
149 | ||
150 | *** Monitor Pipeline table in OVN, trigger flow table recomputation on change. | |
151 | ||
152 | ** ovn-controller parameters and configuration. | |
153 | ||
154 | *** Tunnel encapsulation to publish. | |
155 | ||
156 | Default: VXLAN? Geneve? | |
157 | ||
158 | *** Location of Open_vSwitch database. | |
159 | ||
160 | We can probably use the same default as ovs-vsctl. | |
161 | ||
ec78987f | 162 | *** Location of OVN Southbound database. |
fe36184b BP |
163 | |
164 | Probably no useful default. | |
165 | ||
166 | *** SSL configuration. | |
167 | ||
168 | Can probably get this from Open_vSwitch database. | |
169 | ||
170 | * ovn-nbd | |
171 | ||
172 | ** Monitor OVN_Northbound database, trigger Pipeline recomputation on change. | |
173 | ||
174 | ** Translate each OVN_Northbound entity into Pipeline logical datapath flows. | |
175 | ||
176 | We have to first sit down and figure out what the general | |
177 | translation of each entity is. The original OVN architecture | |
178 | description at | |
179 | http://openvswitch.org/pipermail/dev/2015-January/050380.html had | |
180 | some sketches of these, but they need to be completed and | |
181 | elaborated. | |
182 | ||
183 | Initially, the simplest way to do this is probably to write | |
184 | straight C code to do a full translation of the entire | |
185 | OVN_Northbound database into the format for the Pipeline table in | |
ec78987f JP |
186 | the OVN Southbound database. As scale increases, this will probably |
187 | be too inefficient since a small change in OVN_Northbound requires a | |
188 | full recomputation. At that point, we probably want to adopt a more | |
189 | systematic approach, such as something akin to the "nlog" system used | |
190 | in NVP (see Koponen et al. "Network Virtualization in Multi-tenant | |
191 | Datacenters", NSDI 2014). | |
fe36184b BP |
192 | |
193 | ** Push logical datapath flows to Pipeline table. | |
194 | ||
ec78987f | 195 | ** Monitor OVN Southbound database Bindings table. |
fe36184b BP |
196 | |
197 | Sync rows in the OVN Bindings table to the "up" column in the | |
198 | OVN_Northbound database. | |
199 | ||
200 | * ovsdb-server | |
201 | ||
202 | ovsdb-server should have adequate features for OVN but it probably | |
203 | needs work for scale and possibly for availability as deployments | |
204 | grow. Here are some thoughts. | |
205 | ||
206 | Andy Zhou is looking at these issues. | |
207 | ||
208 | ** Scaling number of connections. | |
209 | ||
210 | In typical use today a given ovsdb-server has only a single-digit | |
ec78987f JP |
211 | number of simultaneous connections. The OVN Southbound database will |
212 | have a connection from every hypervisor. This use case needs testing | |
213 | and probably coding work. Here are some possible improvements. | |
fe36184b BP |
214 | |
215 | *** Reducing amount of data sent to clients. | |
216 | ||
217 | Currently, whenever a row monitored by a client changes, | |
218 | ovsdb-server sends the client every monitored column in the row, | |
219 | even if only one column changes. It might be valuable to reduce | |
220 | this only to the columns that changes. | |
221 | ||
222 | Also, whenever a column changes, ovsdb-server sends the entire | |
223 | contents of the column. It might be valuable, for columns that | |
224 | are sets or maps, to send only added or removed values or | |
225 | key-values pairs. | |
226 | ||
227 | Currently, clients monitor the entire contents of a table. It | |
228 | might make sense to allow clients to monitor only rows that | |
229 | satisfy specific criteria, e.g. to allow an ovn-controller to | |
230 | receive only Pipeline rows for logical networks on its hypervisor. | |
231 | ||
232 | *** Reducing redundant data and code within ovsdb-server. | |
233 | ||
234 | Currently, ovsdb-server separately composes database update | |
235 | information to send to each of its clients. This is fine for a | |
236 | small number of clients, but it wastes time and memory when | |
237 | hundreds of clients all want the same updates (as will be in the | |
238 | case in OVN). | |
239 | ||
240 | (This is somewhat opposed to the idea of letting a client monitor | |
241 | only some rows in a table, since that would increase the diversity | |
242 | among clients.) | |
243 | ||
244 | *** Multithreading. | |
245 | ||
246 | If it turns out that other changes don't let ovsdb-server scale | |
247 | adequately, we can multithread ovsdb-server. Initially one might | |
248 | only break protocol handling into separate threads, leaving the | |
249 | actual database work serialized through a lock. | |
250 | ||
251 | ** Increasing availability. | |
252 | ||
253 | Database availability might become an issue. The OVN system | |
254 | shouldn't grind to a halt if the database becomes unavailable, but | |
255 | it would become impossible to bring VIFs up or down, etc. | |
256 | ||
257 | My current thought on how to increase availability is to add | |
258 | clustering to ovsdb-server, probably via the Raft consensus | |
259 | algorithm. As an experiment, I wrote an implementation of Raft | |
260 | for Open vSwitch that you can clone from: | |
261 | ||
262 | https://github.com/blp/ovs-reviews.git raft | |
263 | ||
264 | ** Reducing startup time. | |
265 | ||
266 | As-is, if ovsdb-server restarts, every client will fetch a fresh | |
267 | copy of the part of the database that it cares about. With | |
268 | hundreds of clients, this could cause heavy CPU load on | |
269 | ovsdb-server and use excessive network bandwidth. It would be | |
270 | better to allow incremental updates even across connection loss. | |
271 | One way might be to use "Difference Digests" as described in | |
272 | Epstein et al., "What's the Difference? Efficient Set | |
273 | Reconciliation Without Prior Context". (I'm not yet aware of | |
274 | previous non-academic use of this technique.) | |
275 | ||
276 | * Miscellaneous: | |
277 | ||
278 | ** Write ovn-nbctl utility. | |
279 | ||
280 | The idea here is that we need a utility to act on the OVN_Northbound | |
281 | database in a way similar to a CMS, so that we can do some testing | |
282 | without an actual CMS in the picture. | |
283 | ||
284 | No details yet. | |
285 | ||
286 | ** Init scripts for ovn-controller (on HVs), ovn-nbd, OVN DB server. | |
287 | ||
288 | ** Distribution packaging. | |
289 | ||
290 | * Not yet scoped: | |
291 | ||
292 | ** Neutron plugin. | |
293 | ||
2e03fc77 RB |
294 | This is being developed on OpenStack's development infrastructure |
295 | to be along side most of the other Neutron plugins. | |
fe36184b | 296 | |
2e03fc77 | 297 | http://git.openstack.org/cgit/stackforge/networking-ovn |
fe36184b | 298 | |
2e03fc77 | 299 | http://git.openstack.org/cgit/stackforge/networking-ovn/tree/doc/source/todo.rst |
fe36184b BP |
300 | |
301 | ** Gateways. |