]> git.proxmox.com Git - ovs.git/blame - ovn/TODO.rst
fail-open: Refactor NORMAL flow add/del
[ovs.git] / ovn / TODO.rst
CommitLineData
e5e68c89
SF
1..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
24==============
25OVN To-do List
26==============
27
28* Work out database for clustering or HA properly.
29
e5e68c89
SF
30* Get incremental updates in ovn-controller and ovn-northd in some
31 sensible way.
32
e5e68c89
SF
33* Self-managing HA for ovn-northd (avoiding the need to set up
34 independent tooling for fail-over).
35
36 Russell Bryant: "For bonus points, increasing N would scale out ovn-northd if
37 it was under too much load, but that's a secondary concern."
38
39* Live migration.
40
41 Russell Bryant: "When you're ready to have the destination take over, you
42 have to remove the iface-id from the source and add it at the destination and
43 I think it'd typically be configured on both ends, since it's a clone of the
44 source VM (and it's config)."
45
46* VLAN trunk ports.
47
48 Russell Bryant: "Today that would require creating 4096 ports for the VM and
49 attach to 4096 OVN networks, so doable, but not quite ideal."
50
e5e68c89
SF
51* Service function chaining.
52
53* MAC learning.
54
55 Han Zhou: "To support VMs that hosts workloads with their own macs, e.g.
56 containers, if not using OVN native container support."
57
58* Finish up ARP/ND support: re-checking bindings, expiring bindings.
59
60* Hitless upgrade, especially for data plane.
61
62* Use OpenFlow "bundles" for transactional data plane updates.
63
64* L3 support
65
66 * Logical routers should send RST replies to TCP packets.
67
68 * IPv6 router ports should periodically send ND Router Advertisements.
69
70* Dynamic IP to MAC binding enhancements.
71
72 OVN has basic support for establishing IP to MAC bindings dynamically, using
73 ARP.
74
75 * Ratelimiting.
76
77 From casual observation, Linux appears to generate at most one ARP per
78 second per destination.
79
80 This might be supported by adding a new OVN logical action for
81 rate-limiting.
82
83 * Tracking queries
84
85 It's probably best to only record in the database responses to queries
86 actually issued by an L3 logical router, so somehow they have to be
87 tracked, probably by putting a tentative binding without a MAC address
88 into the database.
89
90 * Renewal and expiration.
91
92 Something needs to make sure that bindings remain valid and expire those
93 that become stale.
94
95 One way to do this might be to add some support for time to the database
96 server itself.
97
98 * Table size limiting.
99
100 The table of MAC bindings must not be allowed to grow unreasonably large.
101
102 * MTU handling (fragmentation on output)
103
e5e68c89
SF
104* ovsdb-server
105
106 ovsdb-server should have adequate features for OVN but it probably needs work
107 for scale and possibly for availability as deployments grow. Here are some
108 thoughts.
109
110 * Multithreading.
111
112 If it turns out that other changes don't let ovsdb-server scale
113 adequately, we can multithread ovsdb-server. Initially one might
114 only break protocol handling into separate threads, leaving the
115 actual database work serialized through a lock.
116
117 * Increasing availability.
118
119 Database availability might become an issue. The OVN system shouldn't
120 grind to a halt if the database becomes unavailable, but it would become
121 impossible to bring VIFs up or down, etc.
122
123 My current thought on how to increase availability is to add clustering to
124 ovsdb-server, probably via the Raft consensus algorithm. As an experiment,
125 I wrote an implementation of Raft for Open vSwitch that you can clone from:
126
127 https://github.com/blp/ovs-reviews.git raft
128
129 * Reducing startup time.
130
131 As-is, if ovsdb-server restarts, every client will fetch a fresh copy of
132 the part of the database that it cares about. With hundreds of clients,
133 this could cause heavy CPU load on ovsdb-server and use excessive network
134 bandwidth. It would be better to allow incremental updates even across
135 connection loss. One way might be to use "Difference Digests" as described
136 in Epstein et al., "What's the Difference? Efficient Set Reconciliation
137 Without Prior Context". (I'm not yet aware of previous non-academic use of
138 this technique.)
139
140 * Support multiple tunnel encapsulations in Chassis.
141
142 So far, both ovn-controller and ovn-controller-vtep only allow chassis to
143 have one tunnel encapsulation entry. We should extend the implementation
144 to support multiple tunnel encapsulations.
145
146 * Update learned MAC addresses from VTEP to OVN
147
148 The VTEP gateway stores all MAC addresses learned from its physical
149 interfaces in the 'Ucast_Macs_Local' and the 'Mcast_Macs_Local' tables.
150 ovn-controller-vtep should be able to update that information back to
151 ovn-sb database, so that other chassis know where to send packets destined
152 to the extended external network instead of broadcasting.
153
154 * Translate ovn-sb Multicast_Group table into VTEP config
155
156 The ovn-controller-vtep daemon should be able to translate the
157 Multicast_Group table entry in ovn-sb database into Mcast_Macs_Remote table
158 configuration in VTEP database.
159
f399456a
NS
160 * OVN OCF pacemaker script to support Active / Passive HA for OVN dbs provides
161 the option to configure the inactivity_probe value. The default 5 seconds
162 inactivity_probe value is not sufficient and ovsdb-server drops the client
163 IDL connections for openstack deployments when the neutron server is heavily
164 loaded.
165
166 We need to find a proper solution to solve this issue instead of increasing
167 the inactivity_probe value.
168
e5e68c89
SF
169* ACL
170
171 * Support FTP ALGs.
172
173 * Support reject action.