]>
Commit | Line | Data |
---|---|---|
f67539c2 TL |
1 | ================ |
2 | CephFS Mirroring | |
3 | ================ | |
4 | ||
5 | CephFS supports asynchronous replication of snapshots to a remote CephFS file system via | |
6 | `cephfs-mirror` tool. Snapshots are synchronized by mirroring snapshot data followed by | |
7 | creating a snapshot with the same name (for a given directory on the remote file system) as | |
8 | the snapshot being synchronized. | |
9 | ||
10 | Requirements | |
11 | ------------ | |
12 | ||
13 | The primary (local) and secondary (remote) Ceph clusters version should be Pacific or later. | |
14 | ||
15 | Key Idea | |
16 | -------- | |
17 | ||
18 | For a given snapshot pair in a directory, `cephfs-mirror` daemon will rely on readdir diff | |
19 | to identify changes in a directory tree. The diffs are applied to directory in the remote | |
20 | file system thereby only synchronizing files that have changed between two snapshots. | |
21 | ||
22 | This feature is tracked here: https://tracker.ceph.com/issues/47034. | |
23 | ||
24 | Currently, snapshot data is synchronized by bulk copying to the remote filesystem. | |
25 | ||
26 | .. note:: Synchronizing hardlinks is not supported -- hardlinked files get synchronized | |
27 | as separate files. | |
28 | ||
29 | Creating Users | |
30 | -------------- | |
31 | ||
32 | Start by creating a user (on the primary/local cluster) for the mirror daemon. This user | |
33 | requires write capability on the metadata pool to create RADOS objects (index objects) | |
34 | for watch/notify operation and read capability on the data pool(s). | |
35 | ||
36 | $ ceph auth get-or-create client.mirror mon 'profile cephfs-mirror' mds 'allow r' osd 'allow rw tag cephfs metadata=*, allow r tag cephfs data=*' mgr 'allow r' | |
37 | ||
38 | Create a user for each file system peer (on the secondary/remote cluster). This user needs | |
39 | to have full capabilities on the MDS (to take snapshots) and the OSDs:: | |
40 | ||
41 | $ ceph fs authorize <fs_name> client.mirror_remote / rwps | |
42 | ||
43 | This user should be used (as part of peer specification) when adding a peer. | |
44 | ||
45 | Starting Mirror Daemon | |
46 | ---------------------- | |
47 | ||
48 | Mirror daemon should be spawned using `systemctl(1)` unit files:: | |
49 | ||
50 | $ systemctl enable cephfs-mirror@mirror | |
51 | $ systemctl start cephfs-mirror@mirror | |
52 | ||
53 | `cephfs-mirror` daemon can be run in foreground using:: | |
54 | ||
55 | $ cephfs-mirror --id mirror --cluster site-a -f | |
56 | ||
57 | .. note:: User used here is `mirror` as created in the `Creating Users` section. | |
58 | ||
59 | Mirroring Design | |
60 | ---------------- | |
61 | ||
62 | CephFS supports asynchronous replication of snapshots to a remote CephFS file system | |
63 | via `cephfs-mirror` tool. For a given directory, snapshots are synchronized by transferring | |
64 | snapshot data to the remote file system and creating a snapshot with the same name as the | |
65 | snapshot being synchronized. | |
66 | ||
67 | Snapshot Synchronization Order | |
68 | ------------------------------ | |
69 | ||
70 | Although the order in which snapshots get chosen for synchronization does not matter, | |
71 | snapshots are picked based on creation order (using snap-id). | |
72 | ||
73 | Snapshot Incarnation | |
74 | -------------------- | |
75 | ||
76 | A snapshot may be deleted and recreated (with the same name) with different contents. | |
77 | An "old" snapshot could have been synchronized (earlier) and the recreation of the | |
78 | snapshot could have been done when mirroring was disabled. Using snapshot names to | |
79 | infer the point-of-continuation would result in the "new" snapshot (incarnation) | |
80 | never getting picked up for synchronization. | |
81 | ||
82 | Snapshots on the secondary file system stores the snap-id of the snapshot it was | |
83 | synchronized from. This metadata is stored in `SnapInfo` structure on the MDS. | |
84 | ||
85 | Interfaces | |
86 | ---------- | |
87 | ||
88 | `Mirroring` module (manager plugin) provides interfaces for managing directory snapshot | |
89 | mirroring. Manager interfaces are (mostly) wrappers around monitor commands for managing | |
90 | file system mirroring and is the recommended control interface. | |
91 | ||
92 | Mirroring Module and Interface | |
93 | ------------------------------ | |
94 | ||
95 | Mirroring module provides interface for managing directory snapshot mirroring. The module | |
96 | is implemented as a Ceph Manager plugin. Mirroring module does not manage spawning (and | |
97 | terminating) the mirror daemons. Right now the preferred way would be to start/stop | |
98 | mirror daemons via `systemctl(1)`. Going forward, deploying mirror daemons would be | |
99 | managed by `cephadm` (Tracker: http://tracker.ceph.com/issues/47261). | |
100 | ||
101 | The manager module is responsible for assigning directories to mirror daemons for | |
102 | synchronization. Multiple mirror daemons can be spawned to achieve concurrency in | |
103 | directory snapshot synchronization. When mirror daemons are spawned (or terminated) | |
104 | , the mirroring module discovers the modified set of mirror daemons and rebalances | |
105 | the directory assignment amongst the new set thus providing high-availability. | |
106 | ||
107 | .. note:: Multiple mirror daemons is currently untested. Only a single mirror daemon | |
108 | is recommended. | |
109 | ||
110 | Mirroring module is disabled by default. To enable mirroring use:: | |
111 | ||
112 | $ ceph mgr module enable mirroring | |
113 | ||
114 | Mirroring module provides a family of commands to control mirroring of directory | |
115 | snapshots. To add or remove directories, mirroring needs to be enabled for a given | |
116 | file system. To enable mirroring use:: | |
117 | ||
118 | $ ceph fs snapshot mirror enable <fs> | |
119 | ||
120 | .. note:: Mirroring module commands use `fs snapshot mirror` prefix as compared to | |
121 | the monitor commands which `fs mirror` prefix. Make sure to use module | |
122 | commands. | |
123 | ||
124 | To disable mirroring, use:: | |
125 | ||
126 | $ ceph fs snapshot mirror disable <fs> | |
127 | ||
128 | Once mirroring is enabled, add a peer to which directory snapshots are to be mirrored. | |
129 | Peers follow `<client>@<cluster>` specification and get assigned a unique-id (UUID) | |
130 | when added. See `Creating Users` section on how to create Ceph users for mirroring. | |
131 | ||
132 | To add a peer use:: | |
133 | ||
134 | $ ceph fs snapshot mirror peer_add <fs> <remote_cluster_spec> [<remote_fs_name>] [<remote_mon_host>] [<cephx_key>] | |
135 | ||
136 | `<remote_fs_name>` is optional, and default to `<fs>` (on the remote cluster). | |
137 | ||
138 | This requires the remote cluster ceph configuration and user keyring to be available in | |
139 | the primary cluster. See `Bootstrap Peers` section to avoid this. `peer_add` additionally | |
140 | supports passing the remote cluster monitor address and the user key. However, bootstrapping | |
141 | a peer is the recommended way to add a peer. | |
142 | ||
143 | .. note:: Only a single peer is supported right now. | |
144 | ||
145 | To remove a peer use:: | |
146 | ||
147 | $ ceph fs snapshot mirror peer_remove <fs> <peer_uuid> | |
148 | ||
149 | .. note:: See `Mirror Daemon Status` section on how to figure out Peer UUID. | |
150 | ||
151 | To list file system mirror peers use:: | |
152 | ||
153 | $ ceph fs snapshot mirror peer_list <fs> | |
154 | ||
155 | To configure a directory for mirroring, use:: | |
156 | ||
157 | $ ceph fs snapshot mirror add <fs> <path> | |
158 | ||
159 | To stop a mirroring directory snapshots use:: | |
160 | ||
161 | $ ceph fs snapshot mirror remove <fs> <path> | |
162 | ||
163 | Only absolute directory paths are allowed. Also, paths are normalized by the mirroring | |
164 | module, therfore, `/a/b/../b` is equivalent to `/a/b`. | |
165 | ||
166 | $ mkdir -p /d0/d1/d2 | |
167 | $ ceph fs snapshot mirror add cephfs /d0/d1/d2 | |
168 | {} | |
169 | $ ceph fs snapshot mirror add cephfs /d0/d1/../d1/d2 | |
170 | Error EEXIST: directory /d0/d1/d2 is already tracked | |
171 | ||
172 | Once a directory is added for mirroring, its subdirectory or ancestor directories are | |
173 | disallowed to be added for mirorring:: | |
174 | ||
175 | $ ceph fs snapshot mirror add cephfs /d0/d1 | |
176 | Error EINVAL: /d0/d1 is a ancestor of tracked path /d0/d1/d2 | |
177 | $ ceph fs snapshot mirror add cephfs /d0/d1/d2/d3 | |
178 | Error EINVAL: /d0/d1/d2/d3 is a subtree of tracked path /d0/d1/d2 | |
179 | ||
180 | Commands to check directory mapping (to mirror daemons) and directory distribution are | |
181 | detailed in `Mirror Daemon Status` section. | |
182 | ||
183 | Bootstrap Peers | |
184 | --------------- | |
185 | ||
186 | Adding a peer (via `peer_add`) requires the peer cluster configuration and user keyring | |
187 | to be available in the primary cluster (manager host and hosts running the mirror daemon). | |
188 | This can be avoided by bootstrapping and importing a peer token. Peer bootstrap involves | |
189 | creating a bootstrap token on the peer cluster via:: | |
190 | ||
191 | $ ceph fs snapshot mirror peer_bootstrap create <fs_name> <client_entity> <site-name> | |
192 | ||
193 | e.g.:: | |
194 | ||
195 | $ ceph fs snapshot mirror peer_bootstrap create backup_fs client.mirror_remote site-remote | |
196 | {"token": "eyJmc2lkIjogIjBkZjE3MjE3LWRmY2QtNDAzMC05MDc5LTM2Nzk4NTVkNDJlZiIsICJmaWxlc3lzdGVtIjogImJhY2t1cF9mcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcGVlcl9ib290c3RyYXAiLCAic2l0ZV9uYW1lIjogInNpdGUtcmVtb3RlIiwgImtleSI6ICJBUUFhcDBCZ0xtRmpOeEFBVnNyZXozai9YYUV0T2UrbUJEZlJDZz09IiwgIm1vbl9ob3N0IjogIlt2MjoxOTIuMTY4LjAuNTo0MDkxOCx2MToxOTIuMTY4LjAuNTo0MDkxOV0ifQ=="} | |
197 | ||
198 | `site-name` refers to a user-defined string to identify the remote filesystem. In context | |
199 | of `peer_add` interface, `site-name` is the passed in `cluster` name from `remote_cluster_spec`. | |
200 | ||
201 | Import the bootstrap token in the primary cluster via:: | |
202 | ||
203 | $ ceph fs snapshot mirror peer_bootstrap import <fs_name> <token> | |
204 | ||
205 | e.g.:: | |
206 | ||
207 | $ ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogIjBkZjE3MjE3LWRmY2QtNDAzMC05MDc5LTM2Nzk4NTVkNDJlZiIsICJmaWxlc3lzdGVtIjogImJhY2t1cF9mcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcGVlcl9ib290c3RyYXAiLCAic2l0ZV9uYW1lIjogInNpdGUtcmVtb3RlIiwgImtleSI6ICJBUUFhcDBCZ0xtRmpOeEFBVnNyZXozai9YYUV0T2UrbUJEZlJDZz09IiwgIm1vbl9ob3N0IjogIlt2MjoxOTIuMTY4LjAuNTo0MDkxOCx2MToxOTIuMTY4LjAuNTo0MDkxOV0ifQ== | |
208 | ||
209 | Mirror Daemon Status | |
210 | -------------------- | |
211 | ||
212 | Mirror daemons get asynchronously notified about changes in file system mirroring status | |
213 | and/or peer updates. CephFS mirror daemons provide admin socket commands for querying | |
214 | mirror status. To check available commands for mirror status use:: | |
215 | ||
216 | $ ceph --admin-daemon /path/to/mirror/daemon/admin/socket help | |
217 | { | |
218 | .... | |
219 | .... | |
220 | "fs mirror status cephfs@360": "get filesystem mirror status", | |
221 | .... | |
222 | .... | |
223 | } | |
224 | ||
225 | Commands with `fs mirror status` prefix provide mirror status for mirror enabled | |
226 | file systems. Note that `cephfs@360` is of format `filesystem-name@filesystem-id`. | |
227 | This format is required since mirror daemons get asynchronously notified regarding | |
228 | file system mirror status (A file system can be deleted and recreated with the same | |
229 | name). | |
230 | ||
231 | Right now, the command provides minimal information regarding mirror status:: | |
232 | ||
233 | $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@360 | |
234 | { | |
235 | "rados_inst": "192.168.0.5:0/1476644347", | |
236 | "peers": { | |
237 | "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { | |
238 | "remote": { | |
239 | "client_name": "client.mirror_remote", | |
240 | "cluster_name": "site-a", | |
241 | "fs_name": "backup_fs" | |
242 | } | |
243 | } | |
244 | }, | |
245 | "snap_dirs": { | |
246 | "dir_count": 1 | |
247 | } | |
248 | } | |
249 | ||
250 | `Peers` section in the command output above shows the peer information such as unique | |
251 | peer-id (UUID) and specification. The peer-id is required to remove an existing peer | |
252 | as mentioned in the `Mirror Module and Interface` section. | |
253 | ||
254 | Command with `fs mirror peer status` prefix provide peer synchronization status. This | |
255 | command is of format `filesystem-name@filesystem-id peer-uuid`:: | |
256 | ||
257 | $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 | |
258 | { | |
259 | "/d0": { | |
260 | "state": "idle", | |
261 | "last_synced_snap": { | |
262 | "id": 120, | |
263 | "name": "snap1", | |
264 | "sync_duration": 0.079997898999999997, | |
265 | "sync_time_stamp": "274900.558797s" | |
266 | }, | |
267 | "snaps_synced": 2, | |
268 | "snaps_deleted": 0, | |
269 | "snaps_renamed": 0 | |
270 | } | |
271 | } | |
272 | ||
273 | Synchronization stats such as `snaps_synced`, `snaps_deleted` and `snaps_renamed` are reset | |
274 | on daemon restart and/or when a directory is reassigned to another mirror daemon (when | |
275 | multiple mirror daemons are deployed). | |
276 | ||
277 | A directory can be in one of the following states:: | |
278 | ||
279 | - `idle`: The directory is currently not being synchronized | |
280 | - `syncing`: The directory is currently being synchronized | |
281 | - `failed`: The directory has hit upper limit of consecutive failures | |
282 | ||
283 | When a directory hits a configured number of consecutive synchronization failures, the | |
284 | mirror daemon marks it as `failed`. Synchronization for these directories are retried. | |
285 | By default, the number of consecutive failures before a directory is marked as failed | |
286 | is controlled by `cephfs_mirror_max_consecutive_failures_per_directory` configuration | |
287 | option (default: 10) and the retry interval for failed directories is controlled via | |
288 | `cephfs_mirror_retry_failed_directories_interval` configuration option (default: 60s). | |
289 | ||
290 | E.g., adding a regular file for synchronization would result in failed status:: | |
291 | ||
292 | $ ceph fs snapshot mirror add cephfs /f0 | |
293 | $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 | |
294 | { | |
295 | "/d0": { | |
296 | "state": "idle", | |
297 | "last_synced_snap": { | |
298 | "id": 120, | |
299 | "name": "snap1", | |
300 | "sync_duration": 0.079997898999999997, | |
301 | "sync_time_stamp": "274900.558797s" | |
302 | }, | |
303 | "snaps_synced": 2, | |
304 | "snaps_deleted": 0, | |
305 | "snaps_renamed": 0 | |
306 | }, | |
307 | "/f0": { | |
308 | "state": "failed", | |
309 | "snaps_synced": 0, | |
310 | "snaps_deleted": 0, | |
311 | "snaps_renamed": 0 | |
312 | } | |
313 | } | |
314 | ||
315 | This allows a user to add a non-existent directory for synchronization. The mirror daemon | |
316 | would mark the directory as failed and retry (less frequently). When the directory comes | |
317 | to existence, the mirror daemons would unmark the failed state upon successfull snapshot | |
318 | synchronization. | |
319 | ||
320 | When mirroring is disabled, the respective `fs mirror status` command for the file system | |
321 | will not show up in command help. | |
322 | ||
323 | Mirroring module provides a couple of commands to display directory mapping and distribution | |
324 | information. To check which mirror daemon a directory has been mapped to use:: | |
325 | ||
326 | $ ceph fs snapshot mirror dirmap cephfs /d0/d1/d2 | |
327 | { | |
328 | "instance_id": "404148", | |
329 | "last_shuffled": 1601284516.10986, | |
330 | "state": "mapped" | |
331 | } | |
332 | ||
333 | .. note:: `instance_id` is the RAODS instance-id associated with a mirror daemon. | |
334 | ||
335 | Other information such as `state` and `last_shuffled` are interesting when running | |
336 | multiple mirror daemons. | |
337 | ||
338 | When no mirror daemons are running the above command shows:: | |
339 | ||
340 | $ ceph fs snapshot mirror dirmap cephfs /d0/d1/d2 | |
341 | { | |
342 | "reason": "no mirror daemons running", | |
343 | "state": "stalled" | |
344 | } | |
345 | ||
346 | Signifying that no mirror daemons are running and mirroring is stalled. | |
347 | ||
348 | Re-adding Peers | |
349 | --------------- | |
350 | ||
351 | When re-adding (reassigning) a peer to a file system in another cluster, ensure that | |
352 | all mirror daemons have stopped synchronization to the peer. This can be checked | |
353 | via `fs mirror status` admin socket command (the `Peer UUID` should not show up | |
354 | in the command output). Also, it is recommended to purge synchronized directories | |
355 | from the peer before re-adding it to another file system (especially those directories | |
356 | which might exist in the new primary file system). This is not required if re-adding | |
357 | a peer to the same primary file system it was earlier synchronized from. | |
358 | ||
359 | Feature Status | |
360 | -------------- | |
361 | ||
362 | `cephfs-mirror` daemon is built by default (follows `WITH_CEPHFS` CMake rule). |