]>
Commit | Line | Data |
---|---|---|
eafe8130 TL |
1 | .. _fs-volumes-and-subvolumes: |
2 | ||
3 | FS volumes and subvolumes | |
4 | ========================= | |
5 | ||
6 | A single source of truth for CephFS exports is implemented in the volumes | |
7 | module of the :term:`Ceph Manager` daemon (ceph-mgr). The OpenStack shared | |
9f95a23c | 8 | file system service (manila_), Ceph Container Storage Interface (CSI_), |
eafe8130 TL |
9 | storage administrators among others can use the common CLI provided by the |
10 | ceph-mgr volumes module to manage the CephFS exports. | |
11 | ||
12 | The ceph-mgr volumes module implements the following file system export | |
20effc67 | 13 | abstractions: |
eafe8130 TL |
14 | |
15 | * FS volumes, an abstraction for CephFS file systems | |
16 | ||
17 | * FS subvolumes, an abstraction for independent CephFS directory trees | |
18 | ||
19 | * FS subvolume groups, an abstraction for a directory level higher than FS | |
20 | subvolumes to effect policies (e.g., :doc:`/cephfs/file-layouts`) across a | |
21 | set of subvolumes | |
22 | ||
23 | Some possible use-cases for the export abstractions: | |
24 | ||
25 | * FS subvolumes used as manila shares or CSI volumes | |
26 | ||
27 | * FS subvolume groups used as manila share groups | |
28 | ||
29 | Requirements | |
30 | ------------ | |
31 | ||
32 | * Nautilus (14.2.x) or a later version of Ceph | |
33 | ||
34 | * Cephx client user (see :doc:`/rados/operations/user-management`) with | |
35 | the following minimum capabilities:: | |
36 | ||
37 | mon 'allow r' | |
38 | mgr 'allow rw' | |
39 | ||
40 | ||
41 | FS Volumes | |
42 | ---------- | |
43 | ||
44 | Create a volume using:: | |
45 | ||
9f95a23c | 46 | $ ceph fs volume create <vol_name> [<placement>] |
eafe8130 | 47 | |
f91f0fd5 TL |
48 | This creates a CephFS file system and its data and metadata pools. It can also |
49 | try to create MDSes for the filesystem using the enabled ceph-mgr orchestrator | |
50 | module (see :doc:`/mgr/orchestrator`), e.g. rook. | |
51 | ||
52 | <vol_name> is the volume name (an arbitrary string), and | |
53 | ||
54 | <placement> is an optional string signifying which hosts should have NFS Ganesha | |
55 | daemon containers running on them and, optionally, the total number of NFS | |
56 | Ganesha daemons the cluster (should you want to have more than one NFS Ganesha | |
57 | daemon running per node). For example, the following placement string means | |
58 | "deploy NFS Ganesha daemons on nodes host1 and host2 (one daemon per host): | |
59 | ||
60 | "host1,host2" | |
61 | ||
62 | and this placement specification says to deploy two NFS Ganesha daemons each | |
63 | on nodes host1 and host2 (for a total of four NFS Ganesha daemons in the | |
64 | cluster): | |
65 | ||
66 | "4 host1,host2" | |
67 | ||
20effc67 | 68 | For more details on placement specification refer to the :ref:`orchestrator-cli-service-spec`, |
f91f0fd5 | 69 | but keep in mind that specifying the placement via a YAML file is not supported. |
eafe8130 TL |
70 | |
71 | Remove a volume using:: | |
72 | ||
73 | $ ceph fs volume rm <vol_name> [--yes-i-really-mean-it] | |
74 | ||
75 | This removes a file system and its data and metadata pools. It also tries to | |
76 | remove MDSes using the enabled ceph-mgr orchestrator module. | |
77 | ||
78 | List volumes using:: | |
79 | ||
80 | $ ceph fs volume ls | |
81 | ||
1d09f67e TL |
82 | Rename a volume using:: |
83 | ||
84 | $ ceph fs volume rename <vol_name> <new_vol_name> [--yes-i-really-mean-it] | |
85 | ||
86 | Renaming a volume can be an expensive operation. It does the following: | |
87 | ||
88 | - renames the orchestrator managed MDS service to match the <new_vol_name>. | |
89 | This involves launching a MDS service with <new_vol_name> and bringing down | |
90 | the MDS service with <vol_name>. | |
91 | - renames the file system matching <vol_name> to <new_vol_name> | |
92 | - changes the application tags on the data and metadata pools of the file system | |
93 | to <new_vol_name> | |
94 | - renames the metadata and data pools of the file system. | |
95 | ||
96 | The CephX IDs authorized to <vol_name> need to be reauthorized to <new_vol_name>. Any | |
97 | on-going operations of the clients using these IDs may be disrupted. Mirroring is | |
98 | expected to be disabled on the volume. | |
99 | ||
eafe8130 TL |
100 | FS Subvolume groups |
101 | ------------------- | |
102 | ||
103 | Create a subvolume group using:: | |
104 | ||
adb31ebb | 105 | $ ceph fs subvolumegroup create <vol_name> <group_name> [--pool_layout <data_pool_name>] [--uid <uid>] [--gid <gid>] [--mode <octal_mode>] |
eafe8130 TL |
106 | |
107 | The command succeeds even if the subvolume group already exists. | |
108 | ||
109 | When creating a subvolume group you can specify its data pool layout (see | |
92f5a8d4 TL |
110 | :doc:`/cephfs/file-layouts`), uid, gid, and file mode in octal numerals. By default, the |
111 | subvolume group is created with an octal file mode '755', uid '0', gid '0' and data pool | |
112 | layout of its parent directory. | |
eafe8130 TL |
113 | |
114 | ||
115 | Remove a subvolume group using:: | |
116 | ||
117 | $ ceph fs subvolumegroup rm <vol_name> <group_name> [--force] | |
118 | ||
9f95a23c TL |
119 | The removal of a subvolume group fails if it is not empty or non-existent. |
120 | '--force' flag allows the non-existent subvolume group remove command to succeed. | |
eafe8130 TL |
121 | |
122 | ||
123 | Fetch the absolute path of a subvolume group using:: | |
124 | ||
125 | $ ceph fs subvolumegroup getpath <vol_name> <group_name> | |
126 | ||
9f95a23c TL |
127 | List subvolume groups using:: |
128 | ||
129 | $ ceph fs subvolumegroup ls <vol_name> | |
130 | ||
adb31ebb TL |
131 | .. note:: Subvolume group snapshot feature is no longer supported in mainline CephFS (existing group |
132 | snapshots can still be listed and deleted) | |
eafe8130 TL |
133 | |
134 | Remove a snapshot of a subvolume group using:: | |
135 | ||
136 | $ ceph fs subvolumegroup snapshot rm <vol_name> <group_name> <snap_name> [--force] | |
137 | ||
138 | Using the '--force' flag allows the command to succeed that would otherwise | |
139 | fail if the snapshot did not exist. | |
140 | ||
9f95a23c TL |
141 | List snapshots of a subvolume group using:: |
142 | ||
143 | $ ceph fs subvolumegroup snapshot ls <vol_name> <group_name> | |
144 | ||
eafe8130 TL |
145 | |
146 | FS Subvolumes | |
147 | ------------- | |
148 | ||
149 | Create a subvolume using:: | |
150 | ||
adb31ebb | 151 | $ ceph fs subvolume create <vol_name> <subvol_name> [--size <size_in_bytes>] [--group_name <subvol_group_name>] [--pool_layout <data_pool_name>] [--uid <uid>] [--gid <gid>] [--mode <octal_mode>] [--namespace-isolated] |
eafe8130 TL |
152 | |
153 | ||
154 | The command succeeds even if the subvolume already exists. | |
155 | ||
156 | When creating a subvolume you can specify its subvolume group, data pool layout, | |
92f5a8d4 | 157 | uid, gid, file mode in octal numerals, and size in bytes. The size of the subvolume is |
e306af50 TL |
158 | specified by setting a quota on it (see :doc:`/cephfs/quota`). The subvolume can be |
159 | created in a separate RADOS namespace by specifying --namespace-isolated option. By | |
160 | default a subvolume is created within the default subvolume group, and with an octal file | |
92f5a8d4 TL |
161 | mode '755', uid of its subvolume group, gid of its subvolume group, data pool layout of |
162 | its parent directory and no size limit. | |
eafe8130 | 163 | |
9f95a23c | 164 | Remove a subvolume using:: |
eafe8130 | 165 | |
adb31ebb | 166 | $ ceph fs subvolume rm <vol_name> <subvol_name> [--group_name <subvol_group_name>] [--force] [--retain-snapshots] |
eafe8130 TL |
167 | |
168 | ||
169 | The command removes the subvolume and its contents. It does this in two steps. | |
adb31ebb | 170 | First, it moves the subvolume to a trash folder, and then asynchronously purges |
eafe8130 TL |
171 | its contents. |
172 | ||
173 | The removal of a subvolume fails if it has snapshots, or is non-existent. | |
9f95a23c | 174 | '--force' flag allows the non-existent subvolume remove command to succeed. |
eafe8130 | 175 | |
adb31ebb TL |
176 | A subvolume can be removed retaining existing snapshots of the subvolume using the |
177 | '--retain-snapshots' option. If snapshots are retained, the subvolume is considered | |
178 | empty for all operations not involving the retained snapshots. | |
179 | ||
180 | .. note:: Snapshot retained subvolumes can be recreated using 'ceph fs subvolume create' | |
181 | ||
182 | .. note:: Retained snapshots can be used as a clone source to recreate the subvolume, or clone to a newer subvolume. | |
183 | ||
92f5a8d4 TL |
184 | Resize a subvolume using:: |
185 | ||
186 | $ ceph fs subvolume resize <vol_name> <subvol_name> <new_size> [--group_name <subvol_group_name>] [--no_shrink] | |
187 | ||
188 | The command resizes the subvolume quota using the size specified by 'new_size'. | |
189 | '--no_shrink' flag prevents the subvolume to shrink below the current used size of the subvolume. | |
190 | ||
191 | The subvolume can be resized to an infinite size by passing 'inf' or 'infinite' as the new_size. | |
eafe8130 | 192 | |
cd265ab1 TL |
193 | Authorize cephx auth IDs, the read/read-write access to fs subvolumes:: |
194 | ||
195 | $ ceph fs subvolume authorize <vol_name> <sub_name> <auth_id> [--group_name=<group_name>] [--access_level=<access_level>] | |
196 | ||
197 | The 'access_level' takes 'r' or 'rw' as value. | |
198 | ||
199 | Deauthorize cephx auth IDs, the read/read-write access to fs subvolumes:: | |
200 | ||
201 | $ ceph fs subvolume deauthorize <vol_name> <sub_name> <auth_id> [--group_name=<group_name>] | |
202 | ||
203 | List cephx auth IDs authorized to access fs subvolume:: | |
204 | ||
205 | $ ceph fs subvolume authorized_list <vol_name> <sub_name> [--group_name=<group_name>] | |
206 | ||
207 | Evict fs clients based on auth ID and subvolume mounted:: | |
208 | ||
209 | $ ceph fs subvolume evict <vol_name> <sub_name> <auth_id> [--group_name=<group_name>] | |
210 | ||
eafe8130 TL |
211 | Fetch the absolute path of a subvolume using:: |
212 | ||
213 | $ ceph fs subvolume getpath <vol_name> <subvol_name> [--group_name <subvol_group_name>] | |
214 | ||
1911f103 TL |
215 | Fetch the metadata of a subvolume using:: |
216 | ||
217 | $ ceph fs subvolume info <vol_name> <subvol_name> [--group_name <subvol_group_name>] | |
218 | ||
219 | The output format is json and contains fields as follows. | |
220 | ||
221 | * atime: access time of subvolume path in the format "YYYY-MM-DD HH:MM:SS" | |
222 | * mtime: modification time of subvolume path in the format "YYYY-MM-DD HH:MM:SS" | |
223 | * ctime: change time of subvolume path in the format "YYYY-MM-DD HH:MM:SS" | |
224 | * uid: uid of subvolume path | |
225 | * gid: gid of subvolume path | |
226 | * mode: mode of subvolume path | |
227 | * mon_addrs: list of monitor addresses | |
228 | * bytes_pcent: quota used in percentage if quota is set, else displays "undefined" | |
229 | * bytes_quota: quota size in bytes if quota is set, else displays "infinite" | |
230 | * bytes_used: current used size of the subvolume in bytes | |
231 | * created_at: time of creation of subvolume in the format "YYYY-MM-DD HH:MM:SS" | |
232 | * data_pool: data pool the subvolume belongs to | |
233 | * path: absolute path of a subvolume | |
234 | * type: subvolume type indicating whether it's clone or subvolume | |
e306af50 | 235 | * pool_namespace: RADOS namespace of the subvolume |
f6b5b4d7 | 236 | * features: features supported by the subvolume |
adb31ebb TL |
237 | * state: current state of the subvolume |
238 | ||
239 | If a subvolume has been removed retaining its snapshots, the output only contains fields as follows. | |
240 | ||
241 | * type: subvolume type indicating whether it's clone or subvolume | |
242 | * features: features supported by the subvolume | |
243 | * state: current state of the subvolume | |
f6b5b4d7 TL |
244 | |
245 | The subvolume "features" are based on the internal version of the subvolume and is a list containing | |
246 | a subset of the following features, | |
247 | ||
248 | * "snapshot-clone": supports cloning using a subvolumes snapshot as the source | |
249 | * "snapshot-autoprotect": supports automatically protecting snapshots, that are active clone sources, from deletion | |
adb31ebb TL |
250 | * "snapshot-retention": supports removing subvolume contents, retaining any existing snapshots |
251 | ||
252 | The subvolume "state" is based on the current state of the subvolume and contains one of the following values. | |
253 | ||
254 | * "complete": subvolume is ready for all operations | |
255 | * "snapshot-retained": subvolume is removed but its snapshots are retained | |
1911f103 | 256 | |
9f95a23c TL |
257 | List subvolumes using:: |
258 | ||
259 | $ ceph fs subvolume ls <vol_name> [--group_name <subvol_group_name>] | |
eafe8130 | 260 | |
adb31ebb TL |
261 | .. note:: subvolumes that are removed but have snapshots retained, are also listed. |
262 | ||
eafe8130 TL |
263 | Create a snapshot of a subvolume using:: |
264 | ||
265 | $ ceph fs subvolume snapshot create <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>] | |
266 | ||
267 | ||
268 | Remove a snapshot of a subvolume using:: | |
269 | ||
adb31ebb | 270 | $ ceph fs subvolume snapshot rm <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>] [--force] |
eafe8130 TL |
271 | |
272 | Using the '--force' flag allows the command to succeed that would otherwise | |
273 | fail if the snapshot did not exist. | |
274 | ||
adb31ebb TL |
275 | .. note:: if the last snapshot within a snapshot retained subvolume is removed, the subvolume is also removed |
276 | ||
9f95a23c TL |
277 | List snapshots of a subvolume using:: |
278 | ||
279 | $ ceph fs subvolume snapshot ls <vol_name> <subvol_name> [--group_name <subvol_group_name>] | |
280 | ||
e306af50 TL |
281 | Fetch the metadata of a snapshot using:: |
282 | ||
283 | $ ceph fs subvolume snapshot info <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>] | |
284 | ||
285 | The output format is json and contains fields as follows. | |
286 | ||
287 | * created_at: time of creation of snapshot in the format "YYYY-MM-DD HH:MM:SS:ffffff" | |
288 | * data_pool: data pool the snapshot belongs to | |
289 | * has_pending_clones: "yes" if snapshot clone is in progress otherwise "no" | |
e306af50 TL |
290 | * size: snapshot size in bytes |
291 | ||
9f95a23c TL |
292 | Cloning Snapshots |
293 | ----------------- | |
294 | ||
295 | Subvolumes can be created by cloning subvolume snapshots. Cloning is an asynchronous operation involving copying | |
296 | data from a snapshot to a subvolume. Due to this bulk copy nature, cloning is currently inefficient for very huge | |
297 | data sets. | |
298 | ||
f6b5b4d7 TL |
299 | .. note:: Removing a snapshot (source subvolume) would fail if there are pending or in progress clone operations. |
300 | ||
301 | Protecting snapshots prior to cloning was a pre-requisite in the Nautilus release, and the commands to protect/unprotect | |
302 | snapshots were introduced for this purpose. This pre-requisite, and hence the commands to protect/unprotect, is being | |
303 | deprecated in mainline CephFS, and may be removed from a future release. | |
304 | ||
f67539c2 | 305 | The commands being deprecated are: |
9f95a23c | 306 | $ ceph fs subvolume snapshot protect <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>] |
f6b5b4d7 TL |
307 | $ ceph fs subvolume snapshot unprotect <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>] |
308 | ||
309 | .. note:: Using the above commands would not result in an error, but they serve no useful function. | |
310 | ||
311 | .. note:: Use subvolume info command to fetch subvolume metadata regarding supported "features" to help decide if protect/unprotect of snapshots is required, based on the "snapshot-autoprotect" feature availability. | |
9f95a23c TL |
312 | |
313 | To initiate a clone operation use:: | |
314 | ||
315 | $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> | |
316 | ||
317 | If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified as per:: | |
318 | ||
319 | $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --group_name <subvol_group_name> | |
320 | ||
321 | Cloned subvolumes can be a part of a different group than the source snapshot (by default, cloned subvolumes are created in default group). To clone to a particular group use:: | |
322 | ||
323 | $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --target_group_name <subvol_group_name> | |
324 | ||
325 | Similar to specifying a pool layout when creating a subvolume, pool layout can be specified when creating a cloned subvolume. To create a cloned subvolume with a specific pool layout use:: | |
326 | ||
327 | $ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout> | |
328 | ||
f91f0fd5 TL |
329 | Configure maximum number of concurrent clones. The default is set to 4:: |
330 | ||
331 | $ ceph config set mgr mgr/volumes/max_concurrent_clones <value> | |
332 | ||
9f95a23c TL |
333 | To check the status of a clone operation use:: |
334 | ||
335 | $ ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>] | |
336 | ||
337 | A clone can be in one of the following states: | |
338 | ||
339 | #. `pending` : Clone operation has not started | |
340 | #. `in-progress` : Clone operation is in progress | |
f6b5b4d7 | 341 | #. `complete` : Clone operation has successfully finished |
9f95a23c TL |
342 | #. `failed` : Clone operation has failed |
343 | ||
344 | Sample output from an `in-progress` clone operation:: | |
345 | ||
9f95a23c TL |
346 | $ ceph fs subvolume snapshot clone cephfs subvol1 snap1 clone1 |
347 | $ ceph fs clone status cephfs clone1 | |
348 | { | |
349 | "status": { | |
350 | "state": "in-progress", | |
351 | "source": { | |
352 | "volume": "cephfs", | |
353 | "subvolume": "subvol1", | |
354 | "snapshot": "snap1" | |
355 | } | |
356 | } | |
357 | } | |
358 | ||
359 | (NOTE: since `subvol1` is in default group, `source` section in `clone status` does not include group name) | |
360 | ||
361 | .. note:: Cloned subvolumes are accessible only after the clone operation has successfully completed. | |
362 | ||
f6b5b4d7 | 363 | For a successful clone operation, `clone status` would look like so:: |
9f95a23c TL |
364 | |
365 | $ ceph fs clone status cephfs clone1 | |
366 | { | |
367 | "status": { | |
368 | "state": "complete" | |
369 | } | |
370 | } | |
371 | ||
372 | or `failed` state when clone is unsuccessful. | |
373 | ||
374 | On failure of a clone operation, the partial clone needs to be deleted and the clone operation needs to be retriggered. | |
375 | To delete a partial clone use:: | |
376 | ||
377 | $ ceph fs subvolume rm <vol_name> <clone_name> [--group_name <group_name>] --force | |
378 | ||
9f95a23c | 379 | .. note:: Cloning only synchronizes directories, regular files and symbolic links. Also, inode timestamps (access and |
20effc67 | 380 | modification times) are synchronized up to seconds granularity. |
9f95a23c TL |
381 | |
382 | An `in-progress` or a `pending` clone operation can be canceled. To cancel a clone operation use the `clone cancel` command:: | |
383 | ||
384 | $ ceph fs clone cancel <vol_name> <clone_name> [--group_name <group_name>] | |
385 | ||
20effc67 | 386 | On successful cancellation, the cloned subvolume is moved to `canceled` state:: |
9f95a23c | 387 | |
9f95a23c TL |
388 | $ ceph fs subvolume snapshot clone cephfs subvol1 snap1 clone1 |
389 | $ ceph fs clone cancel cephfs clone1 | |
390 | $ ceph fs clone status cephfs clone1 | |
391 | { | |
392 | "status": { | |
393 | "state": "canceled", | |
394 | "source": { | |
395 | "volume": "cephfs", | |
396 | "subvolume": "subvol1", | |
397 | "snapshot": "snap1" | |
398 | } | |
399 | } | |
400 | } | |
401 | ||
402 | .. note:: The canceled cloned can be deleted by using --force option in `fs subvolume rm` command. | |
403 | ||
a4b75251 TL |
404 | |
405 | .. _subvol-pinning: | |
406 | ||
407 | Pinning Subvolumes and Subvolume Groups | |
408 | --------------------------------------- | |
409 | ||
410 | ||
411 | Subvolumes and subvolume groups can be automatically pinned to ranks according | |
412 | to policies. This can help distribute load across MDS ranks in predictable and | |
413 | stable ways. Review :ref:`cephfs-pinning` and :ref:`cephfs-ephemeral-pinning` | |
414 | for details on how pinning works. | |
415 | ||
416 | Pinning is configured by:: | |
417 | ||
418 | $ ceph fs subvolumegroup pin <vol_name> <group_name> <pin_type> <pin_setting> | |
419 | ||
420 | or for subvolumes:: | |
421 | ||
422 | $ ceph fs subvolume pin <vol_name> <group_name> <pin_type> <pin_setting> | |
423 | ||
424 | Typically you will want to set subvolume group pins. The ``pin_type`` may be | |
425 | one of ``export``, ``distributed``, or ``random``. The ``pin_setting`` | |
426 | corresponds to the extended attributed "value" as in the pinning documentation | |
427 | referenced above. | |
428 | ||
429 | So, for example, setting a distributed pinning strategy on a subvolume group:: | |
430 | ||
431 | $ ceph fs subvolumegroup pin cephfilesystem-a csi distributed 1 | |
432 | ||
433 | Will enable distributed subtree partitioning policy for the "csi" subvolume | |
434 | group. This will cause every subvolume within the group to be automatically | |
435 | pinned to one of the available ranks on the file system. | |
436 | ||
437 | ||
eafe8130 TL |
438 | .. _manila: https://github.com/openstack/manila |
439 | .. _CSI: https://github.com/ceph/ceph-csi |