]>
Commit | Line | Data |
---|---|---|
12b84d50 BP |
1 | .. |
2 | Copyright (c) 2017 Nicira, Inc. | |
3 | ||
4 | Licensed under the Apache License, Version 2.0 (the "License"); you may | |
5 | not use this file except in compliance with the License. You may obtain | |
6 | a copy of the License at | |
7 | ||
8 | http://www.apache.org/licenses/LICENSE-2.0 | |
9 | ||
10 | Unless required by applicable law or agreed to in writing, software | |
11 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
12 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
13 | License for the specific language governing permissions and limitations | |
14 | under the License. | |
15 | ||
16 | Convention for heading levels in Open vSwitch documentation: | |
17 | ||
18 | ======= Heading 0 (reserved for the title in a document) | |
19 | ------- Heading 1 | |
20 | ~~~~~~~ Heading 2 | |
21 | +++++++ Heading 3 | |
22 | ''''''' Heading 4 | |
23 | ||
24 | Avoid deeper levels because they do not render well. | |
25 | ||
26 | ===== | |
27 | ovsdb | |
28 | ===== | |
29 | ||
30 | Description | |
31 | =========== | |
32 | ||
fad59491 BP |
33 | OVSDB, the Open vSwitch Database, is a network-accessible database system. |
34 | Schemas in OVSDB specify the tables in a database and their columns' types and | |
35 | can include data, uniqueness, and referential integrity constraints. OVSDB | |
36 | offers atomic, consistent, isolated, durable transactions. RFC 7047 specifies | |
37 | the JSON-RPC based protocol that OVSDB clients and servers use to communicate. | |
12b84d50 BP |
38 | |
39 | The OVSDB protocol is well suited for state synchronization because it | |
40 | allows each client to monitor the contents of a whole database or a subset | |
41 | of it. Whenever a monitored portion of the database changes, the server | |
42 | tells the client what rows were added or modified (including the new | |
43 | contents) or deleted. Thus, OVSDB clients can easily keep track of the | |
44 | newest contents of any part of the database. | |
45 | ||
46 | While OVSDB is general-purpose and not particularly specialized for use with | |
47 | Open vSwitch, Open vSwitch does use it for multiple purposes. The leading use | |
48 | of OVSDB is for configuring and monitoring ``ovs-vswitchd(8)``, the Open | |
49 | vSwitch switch daemon, using the schema documented in | |
05bf1dbb BP |
50 | ``ovs-vswitchd.conf.db(5)``. The Open Virtual Network (OVN) project uses two |
51 | OVSDB schemas, documented as part of that project. Finally, Open vSwitch | |
52 | includes the "VTEP" schema, documented in ``vtep(5)`` that many third-party | |
53 | hardware switches support for configuring VXLAN, although OVS itself does not | |
54 | directly use this schema. | |
12b84d50 BP |
55 | |
56 | The OVSDB protocol specification allows independent, interoperable | |
57 | implementations of OVSDB to be developed. Open vSwitch includes an OVSDB | |
58 | server implementation named ``ovsdb-server(1)``, which supports several | |
59 | protocol extensions documented in its manpage, and a basic command-line OVSDB | |
60 | client named ``ovsdb-client(1)``, as well as OVSDB client libraries for C and | |
61 | for Python. Open vSwitch documentation often speaks of these OVSDB | |
62 | implementations in Open vSwitch as simply "OVSDB," even though that is distinct | |
63 | from the OVSDB protocol; we make the distinction explicit only when it might | |
64 | otherwise be unclear from the context. | |
65 | ||
66 | In addition to these generic OVSDB server and client tools, Open vSwitch | |
67 | includes tools for working with databases that have specific schemas: | |
05bf1dbb BP |
68 | ``ovs-vsctl`` works with the ``ovs-vswitchd`` configuration database and |
69 | ``vtep-ctl`` works with the VTEP database. | |
12b84d50 BP |
70 | |
71 | RFC 7047 specifies the OVSDB protocol but it does not specify an on-disk | |
72 | storage format. Open vSwitch includes ``ovsdb-tool(1)`` for working with its | |
73 | own on-disk database formats. The most notable feature of this format is that | |
74 | ``ovsdb-tool(1)`` makes it easy for users to print the transactions that have | |
75 | changed a database since the last time it was compacted. This feature is often | |
76 | useful for troubleshooting. | |
77 | ||
78 | Schemas | |
79 | ======= | |
80 | ||
81 | Schemas in OVSDB have a JSON format that is specified in RFC 7047. They | |
82 | are often stored in files with an extension ``.ovsschema``. An | |
83 | on-disk database in OVSDB includes a schema and data, embedding both into a | |
84 | single file. The Open vSwitch utility ``ovsdb-tool`` has commands | |
85 | that work with schema files and with the schemas embedded in database | |
86 | files. | |
87 | ||
88 | An Open vSwitch schema has three important identifiers. The first is its | |
89 | name, which is also the name used in JSON-RPC calls to identify a database | |
90 | based on that schema. For example, the schema used to configure Open | |
91 | vSwitch has the name ``Open_vSwitch``. Schema names begin with a | |
92 | letter or an underscore, followed by any number of letters, underscores, or | |
93 | digits. The ``ovsdb-tool`` commands ``schema-name`` and | |
94 | ``db-name`` extract the schema name from a schema or database | |
95 | file, respectively. | |
96 | ||
97 | An OVSDB schema also has a version of the form ``x.y.z`` e.g. ``1.2.3``. | |
98 | Schemas managed within the Open vSwitch project manage version numbering in the | |
99 | following way (but OVSDB does not mandate this approach). Whenever we change | |
100 | the database schema in a non-backward compatible way (e.g. when we delete a | |
101 | column or a table), we increment <x> and set <y> and <z> to 0. When we change | |
102 | the database schema in a backward compatible way (e.g. when we add a new | |
103 | column), we increment <y> and set <z> to 0. When we change the database schema | |
104 | cosmetically (e.g. we reindent its syntax), we increment <z>. The | |
105 | ``ovsdb-tool`` commands ``schema-version`` and ``db-version`` extract the | |
106 | schema version from a schema or database file, respectively. | |
107 | ||
108 | Very old OVSDB schemas do not have a version, but RFC 7047 mandates it. | |
109 | ||
110 | An OVSDB schema optionally has a "checksum." RFC 7047 does not specify the use | |
111 | of the checksum and recommends that clients ignore it. Open vSwitch uses the | |
112 | checksum to remind developers to update the version: at build time, if the | |
113 | schema's embedded checksum, ignoring the checksum field itself, does not match | |
114 | the schema's content, then it fails the build with a recommendation to update | |
115 | the version and the checksum. Thus, a developer who changes the schema, but | |
116 | does not update the version, receives an automatic reminder. In practice this | |
117 | has been an effective way to ensure compliance with the version number policy. | |
118 | The ``ovsdb-tool`` commands ``schema-cksum`` and ``db-cksum`` extract the | |
119 | schema checksum from a schema or database file, respectively. | |
120 | ||
121 | Service Models | |
122 | ============== | |
123 | ||
1b1d2e6d BP |
124 | OVSDB supports three service models for databases: **standalone**, |
125 | **active-backup**, and **clustered**. The service models provide different | |
126 | compromises among consistency, availability, and partition tolerance. They | |
127 | also differ in the number of servers required and in terms of performance. The | |
128 | standalone and active-backup database service models share one on-disk format, | |
129 | and clustered databases use a different format, but the OVSDB programs work | |
130 | with both formats. ``ovsdb(5)`` documents these file formats. | |
12b84d50 BP |
131 | |
132 | RFC 7047, which specifies the OVSDB protocol, does not mandate or specify | |
133 | any particular service model. | |
134 | ||
135 | The following sections describe the individual service models. | |
136 | ||
137 | Standalone Database Service Model | |
138 | --------------------------------- | |
139 | ||
140 | A **standalone** database runs a single server. If the server stops running, | |
141 | the database becomes inaccessible, and if the server's storage is lost or | |
142 | corrupted, the database's content is lost. This service model is appropriate | |
143 | when the database controls a process or activity to which it is linked via | |
144 | "fate-sharing." For example, an OVSDB instance that controls an Open vSwitch | |
145 | virtual switch daemon, ``ovs-vswitchd``, is a standalone database because a | |
146 | server failure would take out both the database and the virtual switch. | |
147 | ||
148 | To set up a standalone database, use ``ovsdb-tool create`` to | |
149 | create a database file, then run ``ovsdb-server`` to start the | |
150 | database service. | |
151 | ||
1b1d2e6d BP |
152 | To configure a client, such as ``ovs-vswitchd`` or ``ovs-vsctl``, to use a |
153 | standalone database, configure the server to listen on a "connection method" | |
154 | that the client can reach, then point the client to that connection method. | |
155 | See `Connection Methods`_ below for information about connection methods. | |
156 | ||
12b84d50 BP |
157 | Active-Backup Database Service Model |
158 | ------------------------------------ | |
159 | ||
160 | An **active-backup** database runs two servers (on different hosts). At any | |
161 | given time, one of the servers is designated with the **active** role and the | |
162 | other the **backup** role. An active server behaves just like a standalone | |
163 | server. A backup server makes an OVSDB connection to the active server and | |
164 | uses it to continuously replicate its content as it changes in real time. | |
165 | OVSDB clients can connect to either server but only the active server allows | |
166 | data modification or lock transactions. | |
167 | ||
168 | Setup for an active-backup database starts from a working standalone database | |
169 | service, which is initially the active server. On another node, to set up a | |
170 | backup server, create a database file with the same schema as the active | |
171 | server. The initial contents of the database file do not matter, as long as | |
172 | the schema is correct, so ``ovsdb-tool create`` will work, as will copying the | |
173 | database file from the active server. Then use | |
174 | ``ovsdb-server --sync-from=<active>`` to start the backup server, where | |
175 | <active> is an OVSDB connection method (see `Connection Methods`_ below) that | |
176 | connects to the active server. At that point, the backup server will fetch a | |
177 | copy of the active database and keep it up-to-date until it is killed. | |
178 | ||
179 | When the active server in an active-backup server pair fails, an administrator | |
180 | can switch the backup server to an active role with the ``ovs-appctl`` command | |
181 | ``ovsdb-server/disconnect-active-ovsdb-server``. Clients then have read/write | |
182 | access to the now-active server. Of course, administrators are slow to respond | |
183 | compared to software, so in practice external management software detects the | |
184 | active server's failure and changes the backup server's role. For example, the | |
05bf1dbb BP |
185 | "Integration Guide for Centralized Control" in the OVN documentation describes |
186 | how to use Pacemaker for this purpose in OVN. | |
12b84d50 BP |
187 | |
188 | Suppose an active server fails and its backup is promoted to active. If the | |
189 | failed server is revived, it must be started as a backup server. Otherwise, if | |
190 | both servers are active, then they may start out of sync, if the database | |
191 | changed while the server was down, and they will continue to diverge over time. | |
192 | This also happens if the software managing the database servers cannot reach | |
193 | the active server and therefore switches the backup to active, but other hosts | |
194 | can reach both servers. These "split-brain" problems are unsolvable in general | |
195 | for server pairs. | |
196 | ||
197 | Compared to a standalone server, the active-backup service model | |
198 | somewhat increases availability, at a risk of split-brain. It adds | |
1b1d2e6d BP |
199 | generally insignificant performance overhead. On the other hand, the |
200 | clustered service model, discussed below, requires at least 3 servers | |
201 | and has greater performance overhead, but it avoids the need for | |
202 | external management software and eliminates the possibility of | |
203 | split-brain. | |
12b84d50 BP |
204 | |
205 | Open vSwitch 2.6 introduced support for the active-backup service model. | |
206 | ||
2ccd66f5 IM |
207 | .. important:: |
208 | ||
209 | There was a change of a database file format in version 2.15. | |
210 | To upgrade/downgrade the ``ovsdb-server`` processes across this version | |
211 | follow the instructions described under | |
212 | `Upgrading from version 2.14 and earlier to 2.15 and later`_ and | |
213 | `Downgrading from version 2.15 and later to 2.14 and earlier`_. | |
214 | ||
1b1d2e6d BP |
215 | Clustered Database Service Model |
216 | -------------------------------- | |
217 | ||
218 | A **clustered** database runs across 3 or 5 or more database servers (the | |
219 | **cluster**) on different hosts. Servers in a cluster automatically | |
220 | synchronize writes within the cluster. A 3-server cluster can remain available | |
221 | in the face of at most 1 server failure; a 5-server cluster tolerates up to 2 | |
222 | failures. Clusters larger than 5 servers will also work, with every 2 added | |
223 | servers allowing the cluster to tolerate 1 more failure, but write performance | |
224 | decreases. The number of servers should be odd: a 4- or 6-server cluster | |
225 | cannot tolerate more failures than a 3- or 5-server cluster, respectively. | |
226 | ||
227 | To set up a clustered database, first initialize it on a single node by running | |
228 | ``ovsdb-tool create-cluster``, then start ``ovsdb-server``. Depending on its | |
229 | arguments, the ``create-cluster`` command can create an empty database or copy | |
230 | a standalone database's contents into the new database. | |
231 | ||
05bf1dbb BP |
232 | To configure a client to use a clustered database, first configure all of the |
233 | servers to listen on a connection method that the client can reach, then point | |
234 | the client to all of the servers' connection methods, comma-separated. See | |
235 | `Connection Methods`_, below, for more detail. | |
1b1d2e6d BP |
236 | |
237 | Open vSwitch 2.9 introduced support for the clustered service model. | |
238 | ||
239 | How to Maintain a Clustered Database | |
240 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
241 | ||
242 | To add a server to a cluster, run ``ovsdb-tool join-cluster`` on the new server | |
243 | and start ``ovsdb-server``. To remove a running server from a cluster, use | |
244 | ``ovs-appctl`` to invoke the ``cluster/leave`` command. When a server fails | |
245 | and cannot be recovered, e.g. because its hard disk crashed, or to otherwise | |
246 | remove a server that is down from a cluster, use ``ovs-appctl`` to invoke | |
247 | ``cluster/kick`` to make the remaining servers kick it out of the cluster. | |
248 | ||
249 | The above methods for adding and removing servers only work for healthy | |
250 | clusters, that is, for clusters with no more failures than their maximum | |
251 | tolerance. For example, in a 3-server cluster, the failure of 2 servers | |
252 | prevents servers joining or leaving the cluster (as well as database access). | |
253 | To prevent data loss or inconsistency, the preferred solution to this problem | |
254 | is to bring up enough of the failed servers to make the cluster healthy again, | |
255 | then if necessary remove any remaining failed servers and add new ones. If | |
256 | this cannot be done, though, use ``ovs-appctl`` to invoke ``cluster/leave | |
257 | --force`` on a running server. This command forces the server to which it is | |
258 | directed to leave its cluster and form a new single-node cluster that contains | |
259 | only itself. The data in the new cluster may be inconsistent with the former | |
260 | cluster: transactions not yet replicated to the server will be lost, and | |
261 | transactions not yet applied to the cluster may be committed. Afterward, any | |
262 | servers in its former cluster will regard the server to have failed. | |
263 | ||
80c42f7f BP |
264 | Once a server leaves a cluster, it may never rejoin it. Instead, create a new |
265 | server and join it to the cluster. | |
266 | ||
1b1d2e6d BP |
267 | The servers in a cluster synchronize data over a cluster management protocol |
268 | that is specific to Open vSwitch; it is not the same as the OVSDB protocol | |
269 | specified in RFC 7047. For this purpose, a server in a cluster is tied to a | |
270 | particular IP address and TCP port, which is specified in the ``ovsdb-tool`` | |
271 | command that creates or joins the cluster. The TCP port used for clustering | |
272 | must be different from that used for OVSDB clients. To change the port or | |
273 | address of a server in a cluster, first remove it from the cluster, then add it | |
274 | back with the new address. | |
275 | ||
276 | To upgrade the ``ovsdb-server`` processes in a cluster from one version of Open | |
277 | vSwitch to another, upgrading them one at a time will keep the cluster healthy | |
278 | during the upgrade process. (This is different from upgrading a database | |
279 | schema, which is covered later under `Upgrading or Downgrading a Database`_.) | |
280 | ||
2ccd66f5 IM |
281 | .. important:: |
282 | ||
283 | There was a change of a database file format in version 2.15. | |
284 | To upgrade/downgrade the ``ovsdb-server`` processes across this version | |
285 | follow the instructions described under | |
286 | `Upgrading from version 2.14 and earlier to 2.15 and later`_ and | |
287 | `Downgrading from version 2.15 and later to 2.14 and earlier`_. | |
288 | ||
1b1d2e6d BP |
289 | Clustered OVSDB does not support the OVSDB "ephemeral columns" feature. |
290 | ``ovsdb-tool`` and ``ovsdb-client`` change ephemeral columns into persistent | |
291 | ones when they work with schemas for clustered databases. Future versions of | |
292 | OVSDB might add support for this feature. | |
293 | ||
2ccd66f5 IM |
294 | Upgrading from version 2.14 and earlier to 2.15 and later |
295 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
296 | ||
297 | There is a change of a database file format in version 2.15 that doesn't allow | |
298 | older versions of ``ovsdb-server`` to read the database file modified by the | |
299 | ``ovsdb-server`` version 2.15 or later. This also affects runtime | |
300 | communications between servers in **active-backup** and **cluster** service | |
301 | models. To upgrade the ``ovsdb-server`` processes from one version of Open | |
302 | vSwitch (2.14 or earlier) to another (2.15 or higher) instructions below should | |
303 | be followed. (This is different from upgrading a database schema, which is | |
304 | covered later under `Upgrading or Downgrading a Database`_.) | |
305 | ||
306 | In case of **standalone** service model no special handling during upgrade is | |
307 | required. | |
308 | ||
309 | For the **active-backup** service model, administrator needs to update backup | |
310 | ``ovsdb-server`` first and the active one after that, or shut down both servers | |
311 | and upgrade at the same time. | |
312 | ||
313 | For the **cluster** service model recommended upgrade strategy is following: | |
314 | ||
315 | 1. Upgrade processes one at a time. Each ``ovsdb-server`` process after | |
316 | upgrade should be started with ``--disable-file-column-diff`` command line | |
317 | argument. | |
318 | ||
319 | 2. When all ``ovsdb-server`` processes upgraded, use ``ovs-appctl`` to invoke | |
320 | ``ovsdb/file/column-diff-enable`` command on each of them or restart all | |
321 | ``ovsdb-server`` processes one at a time without | |
322 | ``--disable-file-column-diff`` command line option. | |
323 | ||
324 | Downgrading from version 2.15 and later to 2.14 and earlier | |
325 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
326 | ||
327 | Similar to upgrading covered under `Upgrading from version 2.14 and earlier to | |
328 | 2.15 and later`_, downgrading from the ``ovsdb-server`` version 2.15 and later | |
329 | to 2.14 and earlier requires additional steps. (This is different from | |
330 | upgrading a database schema, which is covered later under | |
331 | `Upgrading or Downgrading a Database`_.) | |
332 | ||
333 | For all service models it's required to: | |
334 | ||
335 | 1. Stop all ``ovsdb-server`` processes (single process for **standalone** | |
336 | service model, all involved processes for **active-backup** and **cluster** | |
337 | service models). | |
338 | ||
339 | 2. Compact all database files with ``ovsdb-tool compact`` command. | |
340 | ||
341 | 3. Downgrade and restart ``ovsdb-server`` processes. | |
342 | ||
1b1d2e6d BP |
343 | Understanding Cluster Consistency |
344 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
345 | ||
346 | To ensure consistency, clustered OVSDB uses the Raft algorithm described in | |
347 | Diego Ongaro's Ph.D. thesis, "Consensus: Bridging Theory and Practice". In an | |
348 | operational Raft cluster, at any given time a single server is the "leader" and | |
349 | the other nodes are "followers". Only the leader processes transactions, but a | |
350 | transaction is only committed when a majority of the servers confirm to the | |
351 | leader that they have written it to persistent storage. | |
352 | ||
353 | In most database systems, read and write access to the database happens through | |
354 | transactions. In such a system, Raft allows a cluster to present a strongly | |
355 | consistent transactional interface. OVSDB uses conventional transactions for | |
356 | writes, but clients often effectively do reads a different way, by asking the | |
357 | server to "monitor" a database or a subset of one on the client's behalf. | |
358 | Whenever monitored data changes, the server automatically tells the client what | |
359 | changed, which allows the client to maintain an accurate snapshot of the | |
360 | database in its memory. Of course, at any given time, the snapshot may be | |
361 | somewhat dated since some of it could have changed without the change | |
362 | notification yet being received and processed by the client. | |
363 | ||
364 | Given this unconventional usage model, OVSDB also adopts an unconventional | |
365 | clustering model. Each server in a cluster acts independently for the purpose | |
366 | of monitors and read-only transactions, without verifying that data is | |
367 | up-to-date with the leader. Servers forward transactions that write to the | |
368 | database to the leader for execution, ensuring consistency. This has the | |
369 | following consequences: | |
370 | ||
371 | * Transactions that involve writes, against any server in the cluster, are | |
372 | linearizable if clients take care to use correct prerequisites, which is the | |
373 | same condition required for linearizability in a standalone OVSDB. | |
374 | (Actually, "at-least-once" consistency, because OVSDB does not have a session | |
375 | mechanism to drop duplicate transactions if a connection drops after the | |
376 | server commits it but before the client receives the result.) | |
377 | ||
378 | * Read-only transactions can yield results based on a stale version of the | |
379 | database, if they are executed against a follower. Transactions on the | |
380 | leader always yield fresh results. (With monitors, as explained above, a | |
381 | client can always see stale data even without clustering, so clustering does | |
382 | not change the consistency model for monitors.) | |
383 | ||
384 | * Monitor-based (or read-heavy) workloads scale well across a cluster, because | |
385 | clustering OVSDB adds no additional work or communication for reads and | |
386 | monitors. | |
387 | ||
388 | * A write-heavy client should connect to the leader, to avoid the overhead of | |
389 | followers forwarding transactions to the leader. | |
390 | ||
391 | * When a client conducts a mix of read and write transactions across more than | |
392 | one server in a cluster, it can see inconsistent results because a read | |
393 | transaction might read stale data whose updates have not yet propagated from | |
05bf1dbb BP |
394 | the leader. By default, utilities such as ``ovn-sbctl`` (in OVN) connect to |
395 | the cluster leader to avoid this issue. | |
1b1d2e6d BP |
396 | |
397 | The same might occur for transactions against a single follower except that | |
398 | the OVSDB server ensures that the results of a write forwarded to the leader | |
399 | by a given server are visible at that server before it replies to the | |
400 | requesting client. | |
401 | ||
402 | * If a client uses a database on one server in a cluster, then another server | |
403 | in the cluster (perhaps because the first server failed), the client could | |
404 | observe stale data. Clustered OVSDB clients, however, can use a column in | |
405 | the ``_Server`` database to detect that data on a server is older than data | |
406 | that the client previously read. The OVSDB client library in Open vSwitch | |
407 | uses this feature to avoid servers with stale data. | |
408 | ||
12b84d50 BP |
409 | Database Replication |
410 | ==================== | |
411 | ||
412 | OVSDB can layer **replication** on top of any of its service models. | |
413 | Replication, in this context, means to make, and keep up-to-date, a read-only | |
414 | copy of the contents of a database (the ``replica``). One use of replication | |
415 | is to keep an up-to-date backup of a database. A replica used solely for | |
416 | backup would not need to support clients of its own. A set of replicas that do | |
417 | serve clients could be used to scale out read access to the primary database. | |
418 | ||
419 | A database replica is set up in the same way as a backup server in an | |
420 | active-backup pair, with the difference that the replica is never promoted to | |
421 | an active role. | |
422 | ||
423 | A database can have multiple replicas. | |
424 | ||
425 | Open vSwitch 2.6 introduced support for database replication. | |
426 | ||
427 | Connection Methods | |
428 | ================== | |
429 | ||
430 | An OVSDB **connection method** is a string that specifies how to make a | |
431 | JSON-RPC connection between an OVSDB client and server. Connection methods are | |
432 | part of the Open vSwitch implementation of OVSDB and not specified by RFC 7047. | |
433 | ``ovsdb-server`` uses connection methods to specify how it should listen for | |
434 | connections from clients and ``ovsdb-client`` uses them to specify how it | |
435 | should connect to a server. Connections in the opposite direction, where | |
436 | ``ovsdb-server`` connects to a client that is configured to listen for an | |
437 | incoming connection, are also possible. | |
438 | ||
439 | Connection methods are classified as **active** or **passive**. An active | |
440 | connection method makes an outgoing connection to a remote host; a passive | |
441 | connection method listens for connections from remote hosts. The most common | |
442 | arrangement is to configure an OVSDB server with passive connection methods and | |
443 | clients with active ones, but the OVSDB implementation in Open vSwitch supports | |
444 | the opposite arrangement as well. | |
445 | ||
446 | OVSDB supports the following active connection methods: | |
447 | ||
771680d9 YS |
448 | ssl:<host>:<port> |
449 | The specified SSL or TLS <port> on the given <host>. | |
12b84d50 | 450 | |
771680d9 YS |
451 | tcp:<host>:<port> |
452 | The specified TCP <port> on the given <host>. | |
12b84d50 BP |
453 | |
454 | unix:<file> | |
455 | On Unix-like systems, connect to the Unix domain server socket named | |
456 | <file>. | |
457 | ||
458 | On Windows, connect to a local named pipe that is represented by a file | |
459 | created in the path <file> to mimic the behavior of a Unix domain socket. | |
460 | ||
1b1d2e6d BP |
461 | <method1>,<method2>,...,<methodN> |
462 | For a clustered database service to be highly available, a client must be | |
463 | able to connect to any of the servers in the cluster. To do so, specify | |
464 | connection methods for each of the servers separated by commas (and | |
465 | optional spaces). | |
466 | ||
467 | In theory, if machines go up and down and IP addresses change in the right | |
468 | way, a client could talk to the wrong instance of a database. To avoid | |
469 | this possibility, add ``cid:<uuid>`` to the list of methods, where <uuid> | |
470 | is the cluster ID of the desired database cluster, as printed by | |
35551b56 | 471 | ``ovsdb-tool db-cid``. This feature is optional. |
1b1d2e6d | 472 | |
12b84d50 BP |
473 | OVSDB supports the following passive connection methods: |
474 | ||
475 | pssl:<port>[:<ip>] | |
476 | Listen on the given TCP <port> for SSL or TLS connections. By default, | |
477 | connections are not bound to a particular local IP address. Specifying | |
478 | <ip> limits connections to those from the given IP. | |
479 | ||
480 | ptcp:<port>[:<ip>] | |
481 | Listen on the given TCP <port>. By default, connections are not bound to a | |
482 | particular local IP address. Specifying <ip> limits connections to those | |
483 | from the given IP. | |
484 | ||
485 | punix:<file> | |
486 | On Unix-like systems, listens for connections on the Unix domain socket | |
487 | named <file>. | |
488 | ||
489 | On Windows, listens on a local named pipe, creating a named pipe | |
929dc96d NW |
490 | <file> to mimic the behavior of a Unix domain socket. The ACLs of the named |
491 | pipe include LocalSystem, Administrators, and Creator Owner. | |
12b84d50 BP |
492 | |
493 | All IP-based connection methods accept IPv4 and IPv6 addresses. To specify an | |
494 | IPv6 address, wrap it in square brackets, e.g. ``ssl:[::1]:6640``. Passive | |
495 | IP-based connection methods by default listen for IPv4 connections only; use | |
496 | ``[::]`` as the address to accept both IPv4 and IPv6 connections, | |
771680d9 YS |
497 | e.g. ``pssl:6640:[::]``. DNS names are also accepted if built with unbound |
498 | library. On Linux, use ``%<device>`` to designate a scope for IPv6 link-level | |
499 | addresses, e.g. ``ssl:[fe80::1234%eth0]:6653``. | |
12b84d50 BP |
500 | |
501 | The <port> may be omitted from connection methods that use a port number. The | |
502 | default <port> for TCP-based connection methods is 6640, e.g. ``pssl:`` is | |
503 | equivalent to ``pssl:6640``. In Open vSwitch prior to version 2.4.0, the | |
504 | default port was 6632. To avoid incompatibility between older and newer | |
505 | versions, we encourage users to specify a port number. | |
506 | ||
507 | The ``ssl`` and ``pssl`` connection methods requires additional configuration | |
508 | through ``--private-key``, ``--certificate``, and ``--ca-cert`` command line | |
509 | options. Open vSwitch can be built without SSL support, in which case these | |
510 | connection methods are not supported. | |
511 | ||
512 | Database Life Cycle | |
513 | =================== | |
514 | ||
515 | This section describes how to handle various events in the life cycle of | |
516 | a database using the Open vSwitch implementation of OVSDB. | |
517 | ||
518 | Creating a Database | |
519 | ------------------- | |
520 | ||
521 | Creating and starting up the service for a new database was covered | |
522 | separately for each database service model in the `Service | |
523 | Models`_ section, above. | |
524 | ||
525 | Backing Up and Restoring a Database | |
526 | ----------------------------------- | |
527 | ||
528 | OVSDB is often used in contexts where the database contents are not | |
529 | particularly valuable. For example, in many systems, the database for | |
530 | configuring ``ovs-vswitchd`` is essentially rebuilt from scratch | |
531 | at boot time. It is not worthwhile to back up these databases. | |
532 | ||
533 | When OVSDB is used for valuable data, a backup strategy is worth | |
534 | considering. One way is to use database replication, discussed above in | |
535 | `Database Replication`_ which keeps an online, up-to-date | |
536 | copy of a database, possibly on a remote system. This works with all OVSDB | |
537 | service models. | |
538 | ||
539 | A more common backup strategy is to periodically take and store a snapshot. | |
540 | For the standalone and active-backup service models, making a copy of the | |
541 | database file, e.g. using ``cp``, effectively makes a snapshot, and because | |
542 | OVSDB database files are append-only, it works even if the database is being | |
1b1d2e6d BP |
543 | modified when the snapshot takes place. This approach does not work for |
544 | clustered databases. | |
12b84d50 | 545 | |
1b1d2e6d BP |
546 | Another way to make a backup, which works with all OVSDB service models, is to |
547 | use ``ovsdb-client backup``, which connects to a running database server and | |
548 | outputs an atomic snapshot of its schema and content, in the same format used | |
549 | for standalone and active-backup databases. | |
4d0a31b6 | 550 | |
fe0fb885 | 551 | Multiple options are also available when the time comes to restore a database |
1b1d2e6d BP |
552 | from a backup. For the standalone and active-backup service models, one option |
553 | is to stop the database server or servers, overwrite the database file with the | |
554 | backup (e.g. with ``cp``), and then restart the servers. Another way, which | |
555 | works with any service model, is to use ``ovsdb-client restore``, which | |
556 | connects to a running database server and replaces the data in one of its | |
557 | databases by a provided snapshot. The advantage of ``ovsdb-client restore`` is | |
558 | that it causes zero downtime for the database and its server. It has the | |
559 | downside that UUIDs of rows in the restored database will differ from those in | |
560 | the snapshot, because the OVSDB protocol does not allow clients to specify row | |
561 | UUIDs. | |
12b84d50 BP |
562 | |
563 | None of these approaches saves and restores data in columns that the schema | |
564 | designates as ephemeral. This is by design: the designer of a schema only | |
565 | marks a column as ephemeral if it is acceptable for its data to be lost | |
566 | when a database server restarts. | |
567 | ||
1b1d2e6d BP |
568 | Clustering and backup serve different purposes. Clustering increases |
569 | availability, but it does not protect against data loss if, for example, a | |
570 | malicious or malfunctioning OVSDB client deletes or tampers with data. | |
571 | ||
572 | Changing Database Service Model | |
573 | ------------------------------- | |
574 | ||
575 | Use ``ovsdb-tool create-cluster`` to create a clustered database from the | |
c2bb883c AG |
576 | contents of a standalone database. Use ``ovsdb-client backup`` to create a |
577 | standalone database from the contents of a running clustered database. | |
578 | When the cluster is down and cannot be revived, ``ovsdb-client backup`` will | |
579 | not work. | |
1b1d2e6d | 580 | |
00de46f9 AG |
581 | Use ``ovsdb-tool cluster-to-standalone`` to convert clustered database to |
582 | standalone database when the cluster is down and cannot be revived. | |
583 | ||
12b84d50 BP |
584 | Upgrading or Downgrading a Database |
585 | ----------------------------------- | |
586 | ||
587 | The evolution of a piece of software can require changes to the schemas of the | |
588 | databases that it uses. For example, new features might require new tables or | |
589 | new columns in existing tables, or conceptual changes might require a database | |
590 | to be reorganized in other ways. In some cases, the easiest way to deal with a | |
591 | change in a database schema is to delete the existing database and start fresh | |
592 | with the new schema, especially if the data in the database is easy to | |
593 | reconstruct. But in many other cases, it is better to convert the database | |
594 | from one schema to another. | |
595 | ||
596 | The OVSDB implementation in Open vSwitch has built-in support for some simple | |
597 | cases of converting a database from one schema to another. This support can | |
598 | handle changes that add or remove database columns or tables or that eliminate | |
599 | constraints (for example, changing a column that must have exactly one value | |
600 | into one that has one or more values). It can also handle changes that add | |
601 | constraints or make them stricter, but only if the existing data in the | |
602 | database satisfies the new constraints (for example, changing a column that has | |
603 | one or more values into a column with exactly one value, if every row in the | |
604 | column has exactly one value). The built-in conversion can cause data loss in | |
605 | obvious ways, for example if the new schema removes tables or columns, or | |
606 | indirectly, for example by deleting unreferenced rows in tables that the new | |
607 | schema marks for garbage collection. | |
608 | ||
609 | Converting a database can lose data, so it is wise to make a backup beforehand. | |
610 | ||
611 | To use OVSDB's built-in support for schema conversion with a standalone or | |
612 | active-backup database, first stop the database server or servers, then use | |
613 | ``ovsdb-tool convert`` to convert it to the new schema, and then restart the | |
614 | database server. | |
615 | ||
1b1d2e6d BP |
616 | OVSDB also supports online database schema conversion for any of its database |
617 | service models. To convert a database online, use ``ovsdb-client convert``. | |
53178986 BP |
618 | The conversion is atomic, consistent, isolated, and durable. ``ovsdb-server`` |
619 | disconnects any clients connected when the conversion takes place (except | |
620 | clients that use the ``set_db_change_aware`` Open vSwitch extension RPC). Upon | |
621 | reconnection, clients will discover that the schema has changed. | |
622 | ||
12b84d50 BP |
623 | Schema versions and checksums (see Schemas_ above) can give hints about whether |
624 | a database needs to be converted to a new schema. If there is any question, | |
53178986 BP |
625 | though, the ``needs-conversion`` command on ``ovsdb-tool`` and ``ovsdb-client`` |
626 | can provide a definitive answer. | |
12b84d50 BP |
627 | |
628 | Working with Database History | |
629 | ----------------------------- | |
630 | ||
631 | Both on-disk database formats that OVSDB supports are organized as a stream of | |
632 | transaction records. Each record describes a change to the database as a list | |
633 | of rows that were inserted or deleted or modified, along with the details. | |
634 | Therefore, in normal operation, a database file only grows, as each change | |
635 | causes another record to be appended at the end. Usually, a user has no need | |
636 | to understand this file structure. This section covers some exceptions. | |
637 | ||
638 | Compacting Databases | |
639 | -------------------- | |
640 | ||
641 | If OVSDB database files were truly append-only, then over time they would grow | |
642 | without bound. To avoid this problem, OVSDB can **compact** a database file, | |
643 | that is, replace it by a new version that contains only the current database | |
644 | contents, as if it had been inserted by a single transaction. From time to | |
645 | time, ``ovsdb-server`` automatically compacts a database that grows much larger | |
646 | than its minimum size. | |
647 | ||
648 | Because ``ovsdb-server`` automatically compacts databases, it is usually not | |
649 | necessary to compact them manually, but OVSDB still offers a few ways to do it. | |
650 | First, ``ovsdb-tool compact`` can compact a standalone or active-backup | |
651 | database that is not currently being served by ``ovsdb-server`` (or otherwise | |
652 | locked for writing by another process). To compact any database that is | |
653 | currently being served by ``ovsdb-server``, use ``ovs-appctl`` to send the | |
1b1d2e6d BP |
654 | ``ovsdb-server/compact`` command. Each server in an active-backup or clustered |
655 | database maintains its database file independently, so to compact all of them, | |
656 | issue this command separately on each server. | |
12b84d50 BP |
657 | |
658 | Viewing History | |
659 | --------------- | |
660 | ||
661 | The ``ovsdb-tool`` utility's ``show-log`` command displays the transaction | |
662 | records in an OVSDB database file in a human-readable format. By default, it | |
663 | shows minimal detail, but adding the option ``-m`` once or twice increases the | |
664 | level of detail. In addition to the transaction data, it shows the time and | |
665 | date of each transaction and any "comment" added to the transaction by the | |
666 | client. The comments can be helpful for quickly understanding a transaction; | |
667 | for example, ``ovs-vsctl`` adds its command line to the transactions that it | |
668 | makes. | |
669 | ||
1b1d2e6d BP |
670 | The ``show-log`` command works with both OVSDB file formats, but the details of |
671 | the output format differ. For active-backup and clustered databases, the | |
672 | sequence of transactions in each server's log will differ, even at points when | |
673 | they reflect the same data. | |
12b84d50 BP |
674 | |
675 | Truncating History | |
676 | ------------------ | |
677 | ||
678 | It may occasionally be useful to "roll back" a database file to an earlier | |
679 | point. Because of the organization of OVSDB records, this is easy to do. | |
680 | Start by noting the record number <i> of the first record to delete in | |
681 | ``ovsdb-tool show-log`` output. Each record is two lines of plain text, so | |
682 | trimming the log is as simple as running ``head -n <j>``, where <j> = 2 * <i>. | |
683 | ||
684 | Corruption | |
685 | ---------- | |
686 | ||
687 | When ``ovsdb-server`` opens an OVSDB database file, of any kind, it reads as | |
688 | many transaction records as it can from the file until it reaches the end of | |
689 | the file or it encounters a corrupted record. At that point it stops reading | |
690 | and regards the data that it has read to this point as the full contents of the | |
691 | database file, effectively rolling the database back to an earlier point. | |
692 | ||
693 | Each transaction record contains an embedded SHA-1 checksum, which the server | |
694 | verifies as it reads a database file. It detects corruption when a checksum | |
695 | fails to verify. Even though SHA-1 is no longer considered secure for use in | |
696 | cryptography, it is acceptable for this purpose because it is not used to | |
697 | defend against malicious attackers. | |
698 | ||
699 | The first record in a standalone or active-backup database file specifies the | |
1b1d2e6d BP |
700 | schema. ``ovsdb-server`` will refuse to work with a database where this record |
701 | is corrupted, or with a clustered database file with corruption in the first | |
702 | few records. Delete and recreate such a database, or restore it from a backup. | |
12b84d50 BP |
703 | |
704 | When ``ovsdb-server`` adds records to a database file in which it detected | |
705 | corruption, it first truncates the file just after the last good record. | |
706 | ||
707 | See Also | |
708 | ======== | |
709 | ||
710 | RFC 7047, "The Open vSwitch Database Management Protocol." | |
711 | ||
712 | Open vSwitch implementations of generic OVSDB functionality: | |
713 | ``ovsdb-server(1)``, ``ovsdb-client(1)``, ``ovsdb-tool(1)``. | |
714 | ||
715 | Tools for working with databases that have specific OVSDB schemas: | |
05bf1dbb BP |
716 | ``ovs-vsctl(8)``, ``vtep-ctl(8)``, and (in OVN) ``ovn-nbctl(8)``, |
717 | ``ovn-sbctl(8)``. | |
12b84d50 BP |
718 | |
719 | OVSDB schemas for Open vSwitch and related functionality: | |
05bf1dbb BP |
720 | ``ovs-vswitchd.conf.db(5)``, ``vtep(5)``, and (in OVN) ``ovn-nb(5)``, |
721 | ``ovn-sb(5)``. |