]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | ================ |
2 | CephFS Mirroring | |
3 | ================ | |
4 | ||
1e59de90 TL |
5 | CephFS supports asynchronous replication of snapshots to a remote CephFS file |
6 | system via `cephfs-mirror` tool. Snapshots are synchronized by mirroring | |
7 | snapshot data followed by creating a snapshot with the same name (for a given | |
8 | directory on the remote file system) as the snapshot being synchronized. | |
f67539c2 TL |
9 | |
10 | Requirements | |
11 | ------------ | |
12 | ||
1e59de90 TL |
13 | The primary (local) and secondary (remote) Ceph clusters version should be |
14 | Pacific or later. | |
f67539c2 TL |
15 | |
16 | Key Idea | |
17 | -------- | |
18 | ||
1e59de90 TL |
19 | For a given snapshot pair in a directory, `cephfs-mirror` daemon will rely on |
20 | readdir diff to identify changes in a directory tree. The diffs are applied to | |
21 | directory in the remote file system thereby only synchronizing files that have | |
22 | changed between two snapshots. | |
f67539c2 TL |
23 | |
24 | This feature is tracked here: https://tracker.ceph.com/issues/47034. | |
25 | ||
1e59de90 TL |
26 | Currently, snapshot data is synchronized by bulk copying to the remote |
27 | filesystem. | |
f67539c2 | 28 | |
1e59de90 TL |
29 | .. note:: Synchronizing hardlinks is not supported -- hardlinked files get |
30 | synchronized as separate files. | |
f67539c2 TL |
31 | |
32 | Creating Users | |
33 | -------------- | |
34 | ||
1e59de90 TL |
35 | Start by creating a user (on the primary/local cluster) for the mirror daemon. |
36 | This user requires write capability on the metadata pool to create RADOS | |
37 | objects (index objects) for watch/notify operation and read capability on the | |
38 | data pool(s). | |
f67539c2 | 39 | |
1e59de90 TL |
40 | .. prompt:: bash $ |
41 | ||
42 | ceph auth get-or-create client.mirror mon 'profile cephfs-mirror' mds 'allow r' osd 'allow rw tag cephfs metadata=*, allow r tag cephfs data=*' mgr 'allow r' | |
f67539c2 TL |
43 | |
44 | Create a user for each file system peer (on the secondary/remote cluster). This user needs | |
45 | to have full capabilities on the MDS (to take snapshots) and the OSDs:: | |
46 | ||
47 | $ ceph fs authorize <fs_name> client.mirror_remote / rwps | |
48 | ||
49 | This user should be used (as part of peer specification) when adding a peer. | |
50 | ||
51 | Starting Mirror Daemon | |
52 | ---------------------- | |
53 | ||
54 | Mirror daemon should be spawned using `systemctl(1)` unit files:: | |
55 | ||
56 | $ systemctl enable cephfs-mirror@mirror | |
57 | $ systemctl start cephfs-mirror@mirror | |
58 | ||
59 | `cephfs-mirror` daemon can be run in foreground using:: | |
60 | ||
61 | $ cephfs-mirror --id mirror --cluster site-a -f | |
62 | ||
63 | .. note:: User used here is `mirror` as created in the `Creating Users` section. | |
64 | ||
65 | Mirroring Design | |
66 | ---------------- | |
67 | ||
68 | CephFS supports asynchronous replication of snapshots to a remote CephFS file system | |
69 | via `cephfs-mirror` tool. For a given directory, snapshots are synchronized by transferring | |
70 | snapshot data to the remote file system and creating a snapshot with the same name as the | |
71 | snapshot being synchronized. | |
72 | ||
73 | Snapshot Synchronization Order | |
74 | ------------------------------ | |
75 | ||
76 | Although the order in which snapshots get chosen for synchronization does not matter, | |
77 | snapshots are picked based on creation order (using snap-id). | |
78 | ||
79 | Snapshot Incarnation | |
80 | -------------------- | |
81 | ||
82 | A snapshot may be deleted and recreated (with the same name) with different contents. | |
83 | An "old" snapshot could have been synchronized (earlier) and the recreation of the | |
84 | snapshot could have been done when mirroring was disabled. Using snapshot names to | |
85 | infer the point-of-continuation would result in the "new" snapshot (incarnation) | |
86 | never getting picked up for synchronization. | |
87 | ||
88 | Snapshots on the secondary file system stores the snap-id of the snapshot it was | |
89 | synchronized from. This metadata is stored in `SnapInfo` structure on the MDS. | |
90 | ||
91 | Interfaces | |
92 | ---------- | |
93 | ||
94 | `Mirroring` module (manager plugin) provides interfaces for managing directory snapshot | |
95 | mirroring. Manager interfaces are (mostly) wrappers around monitor commands for managing | |
96 | file system mirroring and is the recommended control interface. | |
97 | ||
98 | Mirroring Module and Interface | |
99 | ------------------------------ | |
100 | ||
101 | Mirroring module provides interface for managing directory snapshot mirroring. The module | |
102 | is implemented as a Ceph Manager plugin. Mirroring module does not manage spawning (and | |
103 | terminating) the mirror daemons. Right now the preferred way would be to start/stop | |
104 | mirror daemons via `systemctl(1)`. Going forward, deploying mirror daemons would be | |
105 | managed by `cephadm` (Tracker: http://tracker.ceph.com/issues/47261). | |
106 | ||
107 | The manager module is responsible for assigning directories to mirror daemons for | |
108 | synchronization. Multiple mirror daemons can be spawned to achieve concurrency in | |
109 | directory snapshot synchronization. When mirror daemons are spawned (or terminated) | |
110 | , the mirroring module discovers the modified set of mirror daemons and rebalances | |
111 | the directory assignment amongst the new set thus providing high-availability. | |
112 | ||
113 | .. note:: Multiple mirror daemons is currently untested. Only a single mirror daemon | |
114 | is recommended. | |
115 | ||
116 | Mirroring module is disabled by default. To enable mirroring use:: | |
117 | ||
118 | $ ceph mgr module enable mirroring | |
119 | ||
120 | Mirroring module provides a family of commands to control mirroring of directory | |
121 | snapshots. To add or remove directories, mirroring needs to be enabled for a given | |
122 | file system. To enable mirroring use:: | |
123 | ||
b3b6e05e | 124 | $ ceph fs snapshot mirror enable <fs_name> |
f67539c2 TL |
125 | |
126 | .. note:: Mirroring module commands use `fs snapshot mirror` prefix as compared to | |
127 | the monitor commands which `fs mirror` prefix. Make sure to use module | |
128 | commands. | |
129 | ||
130 | To disable mirroring, use:: | |
131 | ||
b3b6e05e | 132 | $ ceph fs snapshot mirror disable <fs_name> |
f67539c2 TL |
133 | |
134 | Once mirroring is enabled, add a peer to which directory snapshots are to be mirrored. | |
135 | Peers follow `<client>@<cluster>` specification and get assigned a unique-id (UUID) | |
136 | when added. See `Creating Users` section on how to create Ceph users for mirroring. | |
137 | ||
138 | To add a peer use:: | |
139 | ||
b3b6e05e | 140 | $ ceph fs snapshot mirror peer_add <fs_name> <remote_cluster_spec> [<remote_fs_name>] [<remote_mon_host>] [<cephx_key>] |
f67539c2 | 141 | |
b3b6e05e | 142 | `<remote_fs_name>` is optional, and default to `<fs_name>` (on the remote cluster). |
f67539c2 TL |
143 | |
144 | This requires the remote cluster ceph configuration and user keyring to be available in | |
145 | the primary cluster. See `Bootstrap Peers` section to avoid this. `peer_add` additionally | |
146 | supports passing the remote cluster monitor address and the user key. However, bootstrapping | |
147 | a peer is the recommended way to add a peer. | |
148 | ||
149 | .. note:: Only a single peer is supported right now. | |
150 | ||
151 | To remove a peer use:: | |
152 | ||
b3b6e05e | 153 | $ ceph fs snapshot mirror peer_remove <fs_name> <peer_uuid> |
f67539c2 TL |
154 | |
155 | .. note:: See `Mirror Daemon Status` section on how to figure out Peer UUID. | |
156 | ||
157 | To list file system mirror peers use:: | |
158 | ||
b3b6e05e | 159 | $ ceph fs snapshot mirror peer_list <fs_name> |
f67539c2 TL |
160 | |
161 | To configure a directory for mirroring, use:: | |
162 | ||
b3b6e05e | 163 | $ ceph fs snapshot mirror add <fs_name> <path> |
f67539c2 TL |
164 | |
165 | To stop a mirroring directory snapshots use:: | |
166 | ||
b3b6e05e | 167 | $ ceph fs snapshot mirror remove <fs_name> <path> |
f67539c2 TL |
168 | |
169 | Only absolute directory paths are allowed. Also, paths are normalized by the mirroring | |
20effc67 | 170 | module, therefore, `/a/b/../b` is equivalent to `/a/b`. |
f67539c2 TL |
171 | |
172 | $ mkdir -p /d0/d1/d2 | |
173 | $ ceph fs snapshot mirror add cephfs /d0/d1/d2 | |
174 | {} | |
175 | $ ceph fs snapshot mirror add cephfs /d0/d1/../d1/d2 | |
176 | Error EEXIST: directory /d0/d1/d2 is already tracked | |
177 | ||
178 | Once a directory is added for mirroring, its subdirectory or ancestor directories are | |
20effc67 | 179 | disallowed to be added for mirroring:: |
f67539c2 TL |
180 | |
181 | $ ceph fs snapshot mirror add cephfs /d0/d1 | |
182 | Error EINVAL: /d0/d1 is a ancestor of tracked path /d0/d1/d2 | |
183 | $ ceph fs snapshot mirror add cephfs /d0/d1/d2/d3 | |
184 | Error EINVAL: /d0/d1/d2/d3 is a subtree of tracked path /d0/d1/d2 | |
185 | ||
186 | Commands to check directory mapping (to mirror daemons) and directory distribution are | |
187 | detailed in `Mirror Daemon Status` section. | |
188 | ||
189 | Bootstrap Peers | |
190 | --------------- | |
191 | ||
192 | Adding a peer (via `peer_add`) requires the peer cluster configuration and user keyring | |
193 | to be available in the primary cluster (manager host and hosts running the mirror daemon). | |
194 | This can be avoided by bootstrapping and importing a peer token. Peer bootstrap involves | |
195 | creating a bootstrap token on the peer cluster via:: | |
196 | ||
197 | $ ceph fs snapshot mirror peer_bootstrap create <fs_name> <client_entity> <site-name> | |
198 | ||
199 | e.g.:: | |
200 | ||
201 | $ ceph fs snapshot mirror peer_bootstrap create backup_fs client.mirror_remote site-remote | |
202 | {"token": "eyJmc2lkIjogIjBkZjE3MjE3LWRmY2QtNDAzMC05MDc5LTM2Nzk4NTVkNDJlZiIsICJmaWxlc3lzdGVtIjogImJhY2t1cF9mcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcGVlcl9ib290c3RyYXAiLCAic2l0ZV9uYW1lIjogInNpdGUtcmVtb3RlIiwgImtleSI6ICJBUUFhcDBCZ0xtRmpOeEFBVnNyZXozai9YYUV0T2UrbUJEZlJDZz09IiwgIm1vbl9ob3N0IjogIlt2MjoxOTIuMTY4LjAuNTo0MDkxOCx2MToxOTIuMTY4LjAuNTo0MDkxOV0ifQ=="} | |
203 | ||
204 | `site-name` refers to a user-defined string to identify the remote filesystem. In context | |
205 | of `peer_add` interface, `site-name` is the passed in `cluster` name from `remote_cluster_spec`. | |
206 | ||
207 | Import the bootstrap token in the primary cluster via:: | |
208 | ||
209 | $ ceph fs snapshot mirror peer_bootstrap import <fs_name> <token> | |
210 | ||
211 | e.g.:: | |
212 | ||
213 | $ ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogIjBkZjE3MjE3LWRmY2QtNDAzMC05MDc5LTM2Nzk4NTVkNDJlZiIsICJmaWxlc3lzdGVtIjogImJhY2t1cF9mcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcGVlcl9ib290c3RyYXAiLCAic2l0ZV9uYW1lIjogInNpdGUtcmVtb3RlIiwgImtleSI6ICJBUUFhcDBCZ0xtRmpOeEFBVnNyZXozai9YYUV0T2UrbUJEZlJDZz09IiwgIm1vbl9ob3N0IjogIlt2MjoxOTIuMTY4LjAuNTo0MDkxOCx2MToxOTIuMTY4LjAuNTo0MDkxOV0ifQ== | |
214 | ||
215 | Mirror Daemon Status | |
216 | -------------------- | |
217 | ||
218 | Mirror daemons get asynchronously notified about changes in file system mirroring status | |
b3b6e05e TL |
219 | and/or peer updates. |
220 | ||
221 | CephFS mirroring module provides `mirror daemon status` interface to check mirror daemon | |
222 | status:: | |
223 | ||
a4b75251 | 224 | $ ceph fs snapshot mirror daemon status |
b3b6e05e TL |
225 | |
226 | E.g:: | |
227 | ||
a4b75251 | 228 | $ ceph fs snapshot mirror daemon status | jq |
b3b6e05e TL |
229 | [ |
230 | { | |
231 | "daemon_id": 284167, | |
232 | "filesystems": [ | |
233 | { | |
234 | "filesystem_id": 1, | |
235 | "name": "a", | |
236 | "directory_count": 1, | |
237 | "peers": [ | |
238 | { | |
239 | "uuid": "02117353-8cd1-44db-976b-eb20609aa160", | |
240 | "remote": { | |
241 | "client_name": "client.mirror_remote", | |
242 | "cluster_name": "ceph", | |
243 | "fs_name": "backup_fs" | |
244 | }, | |
245 | "stats": { | |
246 | "failure_count": 1, | |
247 | "recovery_count": 0 | |
248 | } | |
249 | } | |
250 | ] | |
251 | } | |
252 | ] | |
253 | } | |
254 | ] | |
255 | ||
256 | An entry per mirror daemon instance is displayed along with information such as configured | |
257 | peers and basic stats. For more detailed stats, use the admin socket interface as detailed | |
258 | below. | |
259 | ||
260 | CephFS mirror daemons provide admin socket commands for querying mirror status. To check | |
261 | available commands for mirror status use:: | |
f67539c2 TL |
262 | |
263 | $ ceph --admin-daemon /path/to/mirror/daemon/admin/socket help | |
264 | { | |
265 | .... | |
266 | .... | |
267 | "fs mirror status cephfs@360": "get filesystem mirror status", | |
268 | .... | |
269 | .... | |
270 | } | |
271 | ||
272 | Commands with `fs mirror status` prefix provide mirror status for mirror enabled | |
273 | file systems. Note that `cephfs@360` is of format `filesystem-name@filesystem-id`. | |
274 | This format is required since mirror daemons get asynchronously notified regarding | |
275 | file system mirror status (A file system can be deleted and recreated with the same | |
276 | name). | |
277 | ||
278 | Right now, the command provides minimal information regarding mirror status:: | |
279 | ||
280 | $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@360 | |
281 | { | |
282 | "rados_inst": "192.168.0.5:0/1476644347", | |
283 | "peers": { | |
284 | "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { | |
285 | "remote": { | |
286 | "client_name": "client.mirror_remote", | |
287 | "cluster_name": "site-a", | |
288 | "fs_name": "backup_fs" | |
289 | } | |
290 | } | |
291 | }, | |
292 | "snap_dirs": { | |
293 | "dir_count": 1 | |
294 | } | |
295 | } | |
296 | ||
297 | `Peers` section in the command output above shows the peer information such as unique | |
298 | peer-id (UUID) and specification. The peer-id is required to remove an existing peer | |
299 | as mentioned in the `Mirror Module and Interface` section. | |
300 | ||
301 | Command with `fs mirror peer status` prefix provide peer synchronization status. This | |
302 | command is of format `filesystem-name@filesystem-id peer-uuid`:: | |
303 | ||
304 | $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 | |
305 | { | |
306 | "/d0": { | |
307 | "state": "idle", | |
308 | "last_synced_snap": { | |
309 | "id": 120, | |
310 | "name": "snap1", | |
311 | "sync_duration": 0.079997898999999997, | |
312 | "sync_time_stamp": "274900.558797s" | |
313 | }, | |
314 | "snaps_synced": 2, | |
315 | "snaps_deleted": 0, | |
316 | "snaps_renamed": 0 | |
317 | } | |
318 | } | |
319 | ||
320 | Synchronization stats such as `snaps_synced`, `snaps_deleted` and `snaps_renamed` are reset | |
321 | on daemon restart and/or when a directory is reassigned to another mirror daemon (when | |
322 | multiple mirror daemons are deployed). | |
323 | ||
324 | A directory can be in one of the following states:: | |
325 | ||
326 | - `idle`: The directory is currently not being synchronized | |
327 | - `syncing`: The directory is currently being synchronized | |
328 | - `failed`: The directory has hit upper limit of consecutive failures | |
329 | ||
330 | When a directory hits a configured number of consecutive synchronization failures, the | |
331 | mirror daemon marks it as `failed`. Synchronization for these directories are retried. | |
332 | By default, the number of consecutive failures before a directory is marked as failed | |
333 | is controlled by `cephfs_mirror_max_consecutive_failures_per_directory` configuration | |
334 | option (default: 10) and the retry interval for failed directories is controlled via | |
335 | `cephfs_mirror_retry_failed_directories_interval` configuration option (default: 60s). | |
336 | ||
337 | E.g., adding a regular file for synchronization would result in failed status:: | |
338 | ||
339 | $ ceph fs snapshot mirror add cephfs /f0 | |
340 | $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 | |
341 | { | |
342 | "/d0": { | |
343 | "state": "idle", | |
344 | "last_synced_snap": { | |
345 | "id": 120, | |
346 | "name": "snap1", | |
347 | "sync_duration": 0.079997898999999997, | |
348 | "sync_time_stamp": "274900.558797s" | |
349 | }, | |
350 | "snaps_synced": 2, | |
351 | "snaps_deleted": 0, | |
352 | "snaps_renamed": 0 | |
353 | }, | |
354 | "/f0": { | |
355 | "state": "failed", | |
356 | "snaps_synced": 0, | |
357 | "snaps_deleted": 0, | |
358 | "snaps_renamed": 0 | |
359 | } | |
360 | } | |
361 | ||
362 | This allows a user to add a non-existent directory for synchronization. The mirror daemon | |
363 | would mark the directory as failed and retry (less frequently). When the directory comes | |
20effc67 | 364 | to existence, the mirror daemons would unmark the failed state upon successful snapshot |
f67539c2 TL |
365 | synchronization. |
366 | ||
367 | When mirroring is disabled, the respective `fs mirror status` command for the file system | |
368 | will not show up in command help. | |
369 | ||
370 | Mirroring module provides a couple of commands to display directory mapping and distribution | |
371 | information. To check which mirror daemon a directory has been mapped to use:: | |
372 | ||
373 | $ ceph fs snapshot mirror dirmap cephfs /d0/d1/d2 | |
374 | { | |
375 | "instance_id": "404148", | |
376 | "last_shuffled": 1601284516.10986, | |
377 | "state": "mapped" | |
378 | } | |
379 | ||
aee94f69 | 380 | .. note:: `instance_id` is the RADOS instance-id associated with a mirror daemon. |
f67539c2 TL |
381 | |
382 | Other information such as `state` and `last_shuffled` are interesting when running | |
383 | multiple mirror daemons. | |
384 | ||
385 | When no mirror daemons are running the above command shows:: | |
386 | ||
387 | $ ceph fs snapshot mirror dirmap cephfs /d0/d1/d2 | |
388 | { | |
389 | "reason": "no mirror daemons running", | |
390 | "state": "stalled" | |
391 | } | |
392 | ||
393 | Signifying that no mirror daemons are running and mirroring is stalled. | |
394 | ||
395 | Re-adding Peers | |
396 | --------------- | |
397 | ||
398 | When re-adding (reassigning) a peer to a file system in another cluster, ensure that | |
399 | all mirror daemons have stopped synchronization to the peer. This can be checked | |
400 | via `fs mirror status` admin socket command (the `Peer UUID` should not show up | |
401 | in the command output). Also, it is recommended to purge synchronized directories | |
402 | from the peer before re-adding it to another file system (especially those directories | |
403 | which might exist in the new primary file system). This is not required if re-adding | |
404 | a peer to the same primary file system it was earlier synchronized from. | |
405 | ||
406 | Feature Status | |
407 | -------------- | |
408 | ||
409 | `cephfs-mirror` daemon is built by default (follows `WITH_CEPHFS` CMake rule). |