]>
Commit | Line | Data |
---|---|---|
fe36184b BP |
1 | * Flow match expression handling library. |
2 | ||
3 | ovn-controller is the primary user of flow match expressions, but | |
4 | the same syntax and I imagine the same code ought to be useful in | |
91ae2065 | 5 | ovn-northd for ACL match expressions. |
fe36184b | 6 | |
fe36184b BP |
7 | * ovn-controller |
8 | ||
9 | ** Flow table handling in ovn-controller. | |
10 | ||
11 | ovn-controller has to transform logical datapath flows from the | |
12 | database into OpenFlow flows. | |
13 | ||
14 | *** Definition (or choice) of data structure for flows and flow table. | |
15 | ||
16 | It would be natural enough to use "struct flow" and "struct | |
17 | classifier" for this. Maybe that is what we should do. However, | |
18 | "struct classifier" is optimized for searches based on packet | |
19 | headers, whereas all we care about here can be implemented with a | |
20 | hash table. Also, we may want to make it easy to add and remove | |
21 | support for fields without recompiling, which is not possible with | |
22 | "struct flow" or "struct classifier". | |
23 | ||
24 | On the other hand, we may find that it is difficult to decide that | |
25 | two OXM flow matches are identical (to normalize them) without a | |
26 | lot of domain-specific knowledge that is already embedded in struct | |
27 | flow. It's also going to be a pain to come up with a way to make | |
28 | anything other than "struct flow" work with the ofputil_*() | |
29 | functions for encoding and decoding OpenFlow. | |
30 | ||
31 | It's also possible we could use struct flow without struct | |
32 | classifier. | |
33 | ||
34 | *** Assembling conjunctive flows from flow match expressions. | |
35 | ||
36 | This transformation explodes logical datapath flows into multiple | |
37 | OpenFlow flow table entries, since a flow match expression in CoD | |
38 | form requires several OpenFlow flow table entries. It also | |
39 | requires merging together OpenFlow flow tables entries that contain | |
40 | "conjunction" actions (really just concatenating their actions). | |
41 | ||
42 | *** Translating logical datapath port names into port numbers. | |
43 | ||
44 | Logical ports are specified by name in logical datapath flows, but | |
45 | OpenFlow only works in terms of numbers. | |
46 | ||
47 | *** Translating logical datapath actions into OpenFlow actions. | |
48 | ||
49 | Some of the logical datapath actions do not have natural | |
50 | representations as OpenFlow actions: they require | |
51 | packet-in/packet-out round trips through ovn-controller. The | |
52 | trickiest part of that is going to be making sure that the | |
53 | packet-out resumes the control flow that was broken off by the | |
54 | packet-in. That's tricky; we'll probably have to restrict control | |
55 | flow or add OVS features to make resuming in general possible. Not | |
56 | sure which is better at this point. | |
57 | ||
58 | *** OpenFlow flow table synchronization. | |
59 | ||
60 | The internal representation of the OpenFlow flow table has to be | |
61 | synced across the controller connection to OVS. This probably | |
62 | boils down to the "flow monitoring" feature of OF1.4 which was then | |
63 | made available as a "standard extension" to OF1.3. (OVS hasn't | |
64 | implemented this for OF1.4 yet, but the feature is based on a OVS | |
65 | extension to OF1.0, so it should be straightforward to add it.) | |
66 | ||
67 | We probably need some way to catch cases where OVS and OVN don't | |
68 | see eye-to-eye on what exactly constitutes a flow, so that OVN | |
69 | doesn't waste a lot of CPU time hammering at OVS trying to install | |
70 | something that it's not going to do. | |
71 | ||
72 | *** Logical/physical translation. | |
73 | ||
74 | When a packet comes into the integration bridge, the first stage of | |
75 | processing needs to translate it from a physical to a logical | |
76 | context. When a packet leaves the integration bridge, the final | |
77 | stage of processing needs to translate it back into a physical | |
78 | context. ovn-controller needs to populate the OpenFlow flows | |
79 | tables to do these translations. | |
80 | ||
81 | *** Determine how to split logical pipeline across physical nodes. | |
82 | ||
83 | From the original OVN architecture document: | |
84 | ||
85 | The pipeline processing is split between the ingress and egress | |
86 | transport nodes. In particular, the logical egress processing may | |
87 | occur at either hypervisor. Processing the logical egress on the | |
88 | ingress hypervisor requires more state about the egress vif's | |
89 | policies, but reduces traffic on the wire that would eventually be | |
90 | dropped. Whereas, processing on the egress hypervisor can reduce | |
91 | broadcast traffic on the wire by doing local replication. We | |
92 | initially plan to process logical egress on the egress hypervisor | |
93 | so that less state needs to be replicated. However, we may change | |
94 | this behavior once we gain some experience writing the logical | |
95 | flows. | |
96 | ||
97 | The split pipeline processing split will influence how tunnel keys | |
98 | are encoded. | |
99 | ||
100 | ** Interaction with Open_vSwitch and OVN databases: | |
101 | ||
fe36184b BP |
102 | *** Monitor Chassis table in OVN. |
103 | ||
104 | Populate Port records for tunnels to other chassis into | |
105 | Open_vSwitch database. As a scale optimization later on, one can | |
106 | populate only records for tunnels to other chassis that have | |
107 | logical networks in common with this one. | |
108 | ||
109 | *** Monitor Pipeline table in OVN, trigger flow table recomputation on change. | |
110 | ||
111 | ** ovn-controller parameters and configuration. | |
112 | ||
113 | *** Tunnel encapsulation to publish. | |
114 | ||
115 | Default: VXLAN? Geneve? | |
116 | ||
fe36184b BP |
117 | *** SSL configuration. |
118 | ||
119 | Can probably get this from Open_vSwitch database. | |
120 | ||
91ae2065 | 121 | * ovn-northd |
fe36184b BP |
122 | |
123 | ** Monitor OVN_Northbound database, trigger Pipeline recomputation on change. | |
124 | ||
125 | ** Translate each OVN_Northbound entity into Pipeline logical datapath flows. | |
126 | ||
127 | We have to first sit down and figure out what the general | |
128 | translation of each entity is. The original OVN architecture | |
129 | description at | |
130 | http://openvswitch.org/pipermail/dev/2015-January/050380.html had | |
131 | some sketches of these, but they need to be completed and | |
132 | elaborated. | |
133 | ||
134 | Initially, the simplest way to do this is probably to write | |
135 | straight C code to do a full translation of the entire | |
136 | OVN_Northbound database into the format for the Pipeline table in | |
ec78987f JP |
137 | the OVN Southbound database. As scale increases, this will probably |
138 | be too inefficient since a small change in OVN_Northbound requires a | |
139 | full recomputation. At that point, we probably want to adopt a more | |
140 | systematic approach, such as something akin to the "nlog" system used | |
141 | in NVP (see Koponen et al. "Network Virtualization in Multi-tenant | |
142 | Datacenters", NSDI 2014). | |
fe36184b BP |
143 | |
144 | ** Push logical datapath flows to Pipeline table. | |
145 | ||
ec78987f | 146 | ** Monitor OVN Southbound database Bindings table. |
fe36184b BP |
147 | |
148 | Sync rows in the OVN Bindings table to the "up" column in the | |
149 | OVN_Northbound database. | |
150 | ||
151 | * ovsdb-server | |
152 | ||
153 | ovsdb-server should have adequate features for OVN but it probably | |
154 | needs work for scale and possibly for availability as deployments | |
155 | grow. Here are some thoughts. | |
156 | ||
157 | Andy Zhou is looking at these issues. | |
158 | ||
159 | ** Scaling number of connections. | |
160 | ||
161 | In typical use today a given ovsdb-server has only a single-digit | |
ec78987f JP |
162 | number of simultaneous connections. The OVN Southbound database will |
163 | have a connection from every hypervisor. This use case needs testing | |
164 | and probably coding work. Here are some possible improvements. | |
fe36184b BP |
165 | |
166 | *** Reducing amount of data sent to clients. | |
167 | ||
168 | Currently, whenever a row monitored by a client changes, | |
169 | ovsdb-server sends the client every monitored column in the row, | |
170 | even if only one column changes. It might be valuable to reduce | |
171 | this only to the columns that changes. | |
172 | ||
173 | Also, whenever a column changes, ovsdb-server sends the entire | |
174 | contents of the column. It might be valuable, for columns that | |
175 | are sets or maps, to send only added or removed values or | |
176 | key-values pairs. | |
177 | ||
178 | Currently, clients monitor the entire contents of a table. It | |
179 | might make sense to allow clients to monitor only rows that | |
180 | satisfy specific criteria, e.g. to allow an ovn-controller to | |
181 | receive only Pipeline rows for logical networks on its hypervisor. | |
182 | ||
183 | *** Reducing redundant data and code within ovsdb-server. | |
184 | ||
185 | Currently, ovsdb-server separately composes database update | |
186 | information to send to each of its clients. This is fine for a | |
187 | small number of clients, but it wastes time and memory when | |
188 | hundreds of clients all want the same updates (as will be in the | |
189 | case in OVN). | |
190 | ||
191 | (This is somewhat opposed to the idea of letting a client monitor | |
192 | only some rows in a table, since that would increase the diversity | |
193 | among clients.) | |
194 | ||
195 | *** Multithreading. | |
196 | ||
197 | If it turns out that other changes don't let ovsdb-server scale | |
198 | adequately, we can multithread ovsdb-server. Initially one might | |
199 | only break protocol handling into separate threads, leaving the | |
200 | actual database work serialized through a lock. | |
201 | ||
202 | ** Increasing availability. | |
203 | ||
204 | Database availability might become an issue. The OVN system | |
205 | shouldn't grind to a halt if the database becomes unavailable, but | |
206 | it would become impossible to bring VIFs up or down, etc. | |
207 | ||
208 | My current thought on how to increase availability is to add | |
209 | clustering to ovsdb-server, probably via the Raft consensus | |
210 | algorithm. As an experiment, I wrote an implementation of Raft | |
211 | for Open vSwitch that you can clone from: | |
212 | ||
213 | https://github.com/blp/ovs-reviews.git raft | |
214 | ||
215 | ** Reducing startup time. | |
216 | ||
217 | As-is, if ovsdb-server restarts, every client will fetch a fresh | |
218 | copy of the part of the database that it cares about. With | |
219 | hundreds of clients, this could cause heavy CPU load on | |
220 | ovsdb-server and use excessive network bandwidth. It would be | |
221 | better to allow incremental updates even across connection loss. | |
222 | One way might be to use "Difference Digests" as described in | |
223 | Epstein et al., "What's the Difference? Efficient Set | |
224 | Reconciliation Without Prior Context". (I'm not yet aware of | |
225 | previous non-academic use of this technique.) | |
226 | ||
227 | * Miscellaneous: | |
228 | ||
229 | ** Write ovn-nbctl utility. | |
230 | ||
231 | The idea here is that we need a utility to act on the OVN_Northbound | |
232 | database in a way similar to a CMS, so that we can do some testing | |
233 | without an actual CMS in the picture. | |
234 | ||
235 | No details yet. | |
236 | ||
91ae2065 | 237 | ** Init scripts for ovn-controller (on HVs), ovn-northd, OVN DB server. |
fe36184b BP |
238 | |
239 | ** Distribution packaging. | |
240 | ||
241 | * Not yet scoped: | |
242 | ||
243 | ** Neutron plugin. | |
244 | ||
2e03fc77 RB |
245 | This is being developed on OpenStack's development infrastructure |
246 | to be along side most of the other Neutron plugins. | |
fe36184b | 247 | |
2e03fc77 | 248 | http://git.openstack.org/cgit/stackforge/networking-ovn |
fe36184b | 249 | |
2e03fc77 | 250 | http://git.openstack.org/cgit/stackforge/networking-ovn/tree/doc/source/todo.rst |
fe36184b BP |
251 | |
252 | ** Gateways. |