+[[pveceph_fs]]
+CephFS
+------
+
+Ceph also provides a filesystem, which runs on top of the same object storage as
+RADOS block devices do. A **M**eta**d**ata **S**erver (`MDS`) is used to map the
+RADOS backed objects to files and directories, allowing Ceph to provide a
+POSIX-compliant, replicated filesystem. This allows you to easily configure a
+clustered, highly available, shared filesystem. Ceph's Metadata Servers
+guarantee that files are evenly distributed over the entire Ceph cluster. As a
+result, even cases of high load will not overwhelm a single host, which can be
+an issue with traditional shared filesystem approaches, for example `NFS`.
+
+[thumbnail="screenshot/gui-node-ceph-cephfs-panel.png"]
+
+{pve} supports both creating a hyper-converged CephFS and using an existing
+xref:storage_cephfs[CephFS as storage] to save backups, ISO files, and container
+templates.
+
+
+[[pveceph_fs_mds]]
+Metadata Server (MDS)
+~~~~~~~~~~~~~~~~~~~~~
+
+CephFS needs at least one Metadata Server to be configured and running, in order
+to function. You can create an MDS through the {pve} web GUI's `Node
+-> CephFS` panel or from the command line with:
+
+----
+pveceph mds create
+----
+
+Multiple metadata servers can be created in a cluster, but with the default
+settings, only one can be active at a time. If an MDS or its node becomes
+unresponsive (or crashes), another `standby` MDS will get promoted to `active`.
+You can speed up the handover between the active and standby MDS by using
+the 'hotstandby' parameter option on creation, or if you have already created it
+you may set/add:
+
+----
+mds standby replay = true
+----
+
+in the respective MDS section of `/etc/pve/ceph.conf`. With this enabled, the
+specified MDS will remain in a `warm` state, polling the active one, so that it
+can take over faster in case of any issues.
+
+NOTE: This active polling will have an additional performance impact on your
+system and the active `MDS`.
+
+.Multiple Active MDS
+
+Since Luminous (12.2.x) you can have multiple active metadata servers
+running at once, but this is normally only useful if you have a high amount of
+clients running in parallel. Otherwise the `MDS` is rarely the bottleneck in a
+system. If you want to set this up, please refer to the Ceph documentation.
+footnote:[Configuring multiple active MDS daemons
+{cephdocs-url}/cephfs/multimds/]
+
+[[pveceph_fs_create]]
+Create CephFS
+~~~~~~~~~~~~~
+
+With {pve}'s integration of CephFS, you can easily create a CephFS using the
+web interface, CLI or an external API interface. Some prerequisites are required
+for this to work:
+
+.Prerequisites for a successful CephFS setup:
+- xref:pve_ceph_install[Install Ceph packages] - if this was already done some
+time ago, you may want to rerun it on an up-to-date system to
+ensure that all CephFS related packages get installed.
+- xref:pve_ceph_monitors[Setup Monitors]
+- xref:pve_ceph_monitors[Setup your OSDs]
+- xref:pveceph_fs_mds[Setup at least one MDS]
+
+After this is complete, you can simply create a CephFS through
+either the Web GUI's `Node -> CephFS` panel or the command line tool `pveceph`,
+for example:
+
+----
+pveceph fs create --pg_num 128 --add-storage
+----
+
+This creates a CephFS named 'cephfs', using a pool for its data named
+'cephfs_data' with '128' placement groups and a pool for its metadata named
+'cephfs_metadata' with one quarter of the data pool's placement groups (`32`).
+Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
+Ceph documentation for more information regarding an appropriate placement group
+number (`pg_num`) for your setup footnoteref:[placement_groups].
+Additionally, the '--add-storage' parameter will add the CephFS to the {pve}
+storage configuration after it has been created successfully.
+
+Destroy CephFS
+~~~~~~~~~~~~~~
+
+WARNING: Destroying a CephFS will render all of its data unusable. This cannot be
+undone!
+
+If you really want to destroy an existing CephFS, you first need to stop or
+destroy all metadata servers (`M̀DS`). You can destroy them either via the web
+interface or via the command line interface, by issuing
+
+----
+pveceph mds destroy NAME
+----
+on each {pve} node hosting an MDS daemon.
+
+Then, you can remove (destroy) the CephFS by issuing
+
+----
+ceph fs rm NAME --yes-i-really-mean-it
+----
+on a single node hosting Ceph. After this, you may want to remove the created
+data and metadata pools, this can be done either over the Web GUI or the CLI
+with:
+
+----
+pveceph pool destroy NAME
+----
+
+
+Ceph maintenance
+----------------
+
+Replace OSDs
+~~~~~~~~~~~~
+
+One of the most common maintenance tasks in Ceph is to replace the disk of an
+OSD. If a disk is already in a failed state, then you can go ahead and run
+through the steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate
+those copies on the remaining OSDs if possible. This rebalancing will start as
+soon as an OSD failure is detected or an OSD was actively stopped.
+
+NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
+`size + 1` nodes are available. The reason for this is that the Ceph object
+balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as
+`failure domain'.
+
+To replace a functioning disk from the GUI, go through the steps in
+xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
+the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.
+
+On the command line, use the following commands:
+
+----
+ceph osd out osd.<id>
+----
+
+You can check with the command below if the OSD can be safely removed.
+
+----
+ceph osd safe-to-destroy osd.<id>
+----
+
+Once the above check tells you that it is safe to remove the OSD, you can
+continue with the following commands:
+
+----
+systemctl stop ceph-osd@<id>.service
+pveceph osd destroy <id>
+----
+
+Replace the old disk with the new one and use the same procedure as described
+in xref:pve_ceph_osd_create[Create OSDs].
+
+Trim/Discard
+~~~~~~~~~~~~
+
+It is good practice to run 'fstrim' (discard) regularly on VMs and containers.
+This releases data blocks that the filesystem isn’t using anymore. It reduces
+data usage and resource load. Most modern operating systems issue such discard
+commands to their disks regularly. You only need to ensure that the Virtual
+Machines enable the xref:qm_hard_disk_discard[disk discard option].
+
+[[pveceph_scrub]]
+Scrub & Deep Scrub
+~~~~~~~~~~~~~~~~~~
+
+Ceph ensures data integrity by 'scrubbing' placement groups. Ceph checks every
+object in a PG for its health. There are two forms of Scrubbing, daily
+cheap metadata checks and weekly deep data checks. The weekly deep scrub reads
+the objects and uses checksums to ensure data integrity. If a running scrub
+interferes with business (performance) needs, you can adjust the time when
+scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-ref/#scrubbing]
+are executed.
+
+
+Ceph Monitoring and Troubleshooting
+-----------------------------------
+
+It is important to continuously monitor the health of a Ceph deployment from the
+beginning, either by using the Ceph tools or by accessing
+the status through the {pve} link:api-viewer/index.html[API].
+
+The following Ceph commands can be used to see if the cluster is healthy
+('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
+('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands
+below will also give you an overview of the current events and actions to take.
+
+----
+# single time output
+pve# ceph -s
+# continuously output status changes (press CTRL+C to stop)
+pve# ceph -w
+----
+
+To get a more detailed view, every Ceph service has a log file under
+`/var/log/ceph/`. If more detail is required, the log level can be
+adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
+
+You can find more information about troubleshooting
+footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]
+a Ceph cluster on the official website.
+