]>
Commit | Line | Data |
---|---|---|
9f95a23c TL |
1 | ========================= |
2 | Create a Ceph file system | |
3 | ========================= | |
7c673cae | 4 | |
11fdf7f2 TL |
5 | Creating pools |
6 | ============== | |
7c673cae | 7 | |
9f95a23c | 8 | A Ceph file system requires at least two RADOS pools, one for data and one for metadata. |
1e59de90 TL |
9 | There are important considerations when planning these pools: |
10 | ||
11 | - We recommend configuring *at least* 3 replicas for the metadata pool, | |
12 | as data loss in this pool can render the entire file system inaccessible. | |
13 | Configuring 4 would not be extreme, especially since the metadata pool's | |
14 | capacity requirements are quite modest. | |
15 | - We recommend the fastest feasible low-latency storage devices (NVMe, Optane, | |
16 | or at the very least SAS/SATA SSD) for the metadata pool, as this will | |
17 | directly affect the latency of client file system operations. | |
05a536ef TL |
18 | - We strongly suggest that the CephFS metadata pool be provisioned on dedicated |
19 | SSD / NVMe OSDs. This ensures that high client workload does not adversely | |
20 | impact metadata operations. See :ref:`device_classes` to configure pools this | |
21 | way. | |
92f5a8d4 | 22 | - The data pool used to create the file system is the "default" data pool and |
1e59de90 TL |
23 | the location for storing all inode backtrace information, which is used for hard link |
24 | management and disaster recovery. For this reason, all CephFS inodes | |
25 | have at least one object in the default data pool. If erasure-coded | |
26 | pools are planned for file system data, it is best to configure the default as | |
27 | a replicated pool to improve small-object write and | |
28 | read performance when updating backtraces. Separately, another erasure-coded | |
92f5a8d4 TL |
29 | data pool can be added (see also :ref:`ecpool`) that can be used on an entire |
30 | hierarchy of directories and files (see also :ref:`file-layouts`). | |
7c673cae FG |
31 | |
32 | Refer to :doc:`/rados/operations/pools` to learn more about managing pools. For | |
9f95a23c | 33 | example, to create two pools with default settings for use with a file system, you |
7c673cae FG |
34 | might run the following commands: |
35 | ||
36 | .. code:: bash | |
37 | ||
9f95a23c TL |
38 | $ ceph osd pool create cephfs_data |
39 | $ ceph osd pool create cephfs_metadata | |
7c673cae | 40 | |
1e59de90 | 41 | The metadata pool will typically hold at most a few gigabytes of data. For |
92f5a8d4 TL |
42 | this reason, a smaller PG count is usually recommended. 64 or 128 is commonly |
43 | used in practice for large clusters. | |
44 | ||
e306af50 TL |
45 | .. note:: The names of the file systems, metadata pools, and data pools can |
46 | only have characters in the set [a-zA-Z0-9\_-.]. | |
92f5a8d4 | 47 | |
9f95a23c TL |
48 | Creating a file system |
49 | ====================== | |
11fdf7f2 | 50 | |
9f95a23c | 51 | Once the pools are created, you may enable the file system using the ``fs new`` command: |
7c673cae FG |
52 | |
53 | .. code:: bash | |
54 | ||
1e59de90 TL |
55 | $ ceph fs new <fs_name> <metadata> <data> [--force] [--allow-dangerous-metadata-overlay] [<fscid:int>] [--recover] |
56 | ||
57 | This command creates a new file system with specified metadata and data pool. | |
58 | The specified data pool is the default data pool and cannot be changed once set. | |
59 | Each file system has its own set of MDS daemons assigned to ranks so ensure that | |
60 | you have sufficient standby daemons available to accommodate the new file system. | |
61 | ||
62 | The ``--force`` option is used to achieve any of the following: | |
63 | ||
64 | - To set an erasure-coded pool for the default data pool. Use of an EC pool for the | |
65 | default data pool is discouraged. Refer to `Creating pools`_ for details. | |
66 | - To set non-empty pool (pool already contains some objects) for the metadata pool. | |
67 | - To create a file system with a specific file system's ID (fscid). | |
68 | The --force option is required with --fscid option. | |
69 | ||
70 | The ``--allow-dangerous-metadata-overlay`` option permits the reuse metadata and | |
71 | data pools if it is already in-use. This should only be done in emergencies and | |
72 | after careful reading of the documentation. | |
73 | ||
74 | If the ``--fscid`` option is provided then this creates a file system with a | |
75 | specific fscid. This can be used when an application expects the file system's ID | |
76 | to be stable after it has been recovered, e.g., after monitor databases are | |
77 | lost and rebuilt. Consequently, file system IDs don't always keep increasing | |
78 | with newer file systems. | |
79 | ||
80 | The ``--recover`` option sets the state of file system's rank 0 to existing but | |
81 | failed. So when a MDS daemon eventually picks up rank 0, the daemon reads the | |
82 | existing in-RADOS metadata and doesn't overwrite it. The flag also prevents the | |
83 | standby MDS daemons to join the file system. | |
7c673cae FG |
84 | |
85 | For example: | |
86 | ||
87 | .. code:: bash | |
88 | ||
89 | $ ceph fs new cephfs cephfs_metadata cephfs_data | |
90 | $ ceph fs ls | |
91 | name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] | |
92 | ||
9f95a23c | 93 | Once a file system has been created, your MDS(s) will be able to enter |
7c673cae FG |
94 | an *active* state. For example, in a single MDS system: |
95 | ||
96 | .. code:: bash | |
97 | ||
98 | $ ceph mds stat | |
11fdf7f2 | 99 | cephfs-1/1/1 up {0=a=up:active} |
7c673cae | 100 | |
9f95a23c TL |
101 | Once the file system is created and the MDS is active, you are ready to mount |
102 | the file system. If you have created more than one file system, you will | |
7c673cae FG |
103 | choose which to use when mounting. |
104 | ||
f67539c2 TL |
105 | - `Mount CephFS`_ |
106 | - `Mount CephFS as FUSE`_ | |
107 | - `Mount CephFS on Windows`_ | |
7c673cae | 108 | |
f67539c2 TL |
109 | .. _Mount CephFS: ../../cephfs/mount-using-kernel-driver |
110 | .. _Mount CephFS as FUSE: ../../cephfs/mount-using-fuse | |
111 | .. _Mount CephFS on Windows: ../../cephfs/ceph-dokan | |
11fdf7f2 | 112 | |
9f95a23c TL |
113 | If you have created more than one file system, and a client does not |
114 | specify a file system when mounting, you can control which file system | |
1e59de90 | 115 | they will see by using the ``ceph fs set-default`` command. |
11fdf7f2 | 116 | |
f67539c2 TL |
117 | Adding a Data Pool to the File System |
118 | ------------------------------------- | |
119 | ||
120 | See :ref:`adding-data-pool-to-file-system`. | |
121 | ||
122 | ||
11fdf7f2 TL |
123 | Using Erasure Coded pools with CephFS |
124 | ===================================== | |
125 | ||
126 | You may use Erasure Coded pools as CephFS data pools as long as they have overwrites enabled, which is done as follows: | |
127 | ||
128 | .. code:: bash | |
129 | ||
130 | ceph osd pool set my_ec_pool allow_ec_overwrites true | |
131 | ||
1e59de90 | 132 | Note that EC overwrites are only supported when using OSDs with the BlueStore backend. |
11fdf7f2 TL |
133 | |
134 | You may not use Erasure Coded pools as CephFS metadata pools, because CephFS metadata is stored using RADOS *OMAP* data structures, which EC pools cannot store. | |
135 |