Update the default PG number to 128

[pve-docs.git] / pveceph.adoc
diff --git a/pveceph.adoc b/pveceph.adoc

index 21a496560932e030fc31f84ce50d0f8427f6ba76..a888b4aa2f1e4faf3c8b3f0669556e298d9d505a 100644 (file)
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -23,7 +23,7 @@ Manage Ceph Services on Proxmox VE Nodes
  :pve-toplevel:
  endif::manvolnum[]
  
-[thumbnail="gui-ceph-status.png"]
+[thumbnail="screenshot/gui-ceph-status.png"]
  
  {pve} unifies your compute and storage systems, i.e. you can use the same
  physical nodes within a cluster for both computing (processing VMs and
@@ -37,18 +37,16 @@ on the hypervisor nodes.
  Ceph is a distributed object store and file system designed to provide
  excellent performance, reliability and scalability.
  
-.Some of the advantages of Ceph are:
-- Easy setup and management with CLI and GUI support on Proxmox VE
+.Some advantages of Ceph on {pve} are:
+- Easy setup and management with CLI and GUI support
  - Thin provisioning
  - Snapshots support
  - Self healing
-- No single point of failure
  - Scalable to the exabyte level
  - Setup pools with different performance and redundancy characteristics
  - Data is replicated, making it fault tolerant
  - Runs on economical commodity hardware
  - No need for hardware RAID controllers
-- Easy management
  - Open source
  
  For small to mid sized deployments, it is possible to install a Ceph server for
@@ -83,14 +81,18 @@ Check also the recommendations from
  http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website].
  
  .Avoid RAID
-While RAID controller are build for storage virtualisation, to combine
-independent disks to form one or more logical units. Their caching methods,
-algorithms (RAID modes; incl. JBOD), disk or write/read optimisations are
-targeted towards aforementioned logical units and not to Ceph.
+As Ceph handles data object redundancy and multiple parallel writes to disks
+(OSDs) on its own, using a RAID controller normally doesn’t improve
+performance or availability. On the contrary, Ceph is designed to handle whole
+disks on it's own, without any abstraction in between. RAID controller are not
+designed for the Ceph use case and may complicate things and sometimes even
+reduce performance, as their write and caching algorithms may interfere with
+the ones from Ceph.
  
  WARNING: Avoid RAID controller, use host bus adapter (HBA) instead.
  
  
+[[pve_ceph_install]]
  Installation of Ceph Packages
  -----------------------------
  
@@ -108,7 +110,7 @@ This sets up an `apt` package repository in
  Creating initial Ceph configuration
  -----------------------------------
  
-[thumbnail="gui-ceph-config.png"]
+[thumbnail="screenshot/gui-ceph-config.png"]
  
  After installation of packages, you need to create an initial Ceph
  configuration on just one node, based on your network (`10.10.10.0/24`
@@ -130,7 +132,7 @@ Ceph commands without the need to specify a configuration file.
  Creating Ceph Monitors
  ----------------------
  
-[thumbnail="gui-ceph-monitor.png"]
+[thumbnail="screenshot/gui-ceph-monitor.png"]
  
  The Ceph Monitor (MON)
  footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
@@ -173,7 +175,7 @@ pveceph createmgr
  Creating Ceph OSDs
  ------------------
  
-[thumbnail="gui-ceph-osd-status.png"]
+[thumbnail="screenshot/gui-ceph-osd-status.png"]
  
  via GUI or via CLI as follows:
  
@@ -277,16 +279,16 @@ highly recommended to achieve good performance.
  Creating Ceph Pools
  -------------------
  
-[thumbnail="gui-ceph-pools.png"]
+[thumbnail="screenshot/gui-ceph-pools.png"]
  
  A pool is a logical group for storing objects. It holds **P**lacement
  **G**roups (PG), a collection of objects.
  
  When no options are given, we set a
-default of **64 PGs**, a **size of 3 replicas** and a **min_size of 2 replicas**
+default of **128 PGs**, a **size of 3 replicas** and a **min_size of 2 replicas**
  for serving objects in a degraded state.
  
-NOTE: The default number of PGs works for 2-6 disks. Ceph throws a
+NOTE: The default number of PGs works for 2-5 disks. Ceph throws a
  "HEALTH_WARNING" if you have too few or too many PGs in your cluster.
  
  It is advised to calculate the PG number depending on your setup, you can find
@@ -394,7 +396,7 @@ separately.
  Ceph Client
  -----------
  
-[thumbnail="gui-ceph-log.png"]
+[thumbnail="screenshot/gui-ceph-log.png"]
  
  You can then configure {pve} to use such pools to store VM or
  Container images. Simply use the GUI too add a new `RBD` storage (see
@@ -414,6 +416,123 @@ mkdir /etc/pve/priv/ceph
  cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring
  ----
  
+[[pveceph_fs]]
+CephFS
+------
+
+Ceph provides also a filesystem running on top of the same object storage as
+RADOS block devices do. A **M**eta**d**ata **S**erver (`MDS`) is used to map
+the RADOS backed objects to files and directories, allowing to provide a
+POSIX-compliant replicated filesystem. This allows one to have a clustered
+highly available shared filesystem in an easy way if ceph is already used.  Its
+Metadata Servers guarantee that files get balanced out over the whole Ceph
+cluster, this way even high load will not overload a single host, which can be
+be an issue with traditional shared filesystem approaches, like `NFS`, for
+example.
+
+{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage])
+to save backups, ISO files or container templates and creating a
+hyper-converged CephFS itself.
+
+
+[[pveceph_fs_mds]]
+Metadata Server (MDS)
+~~~~~~~~~~~~~~~~~~~~~
+
+CephFS needs at least one Metadata Server to be configured and running to be
+able to work. One can simply create one through the {pve} web GUI's `Node ->
+CephFS` panel or on the command line with:
+
+----
+pveceph mds create
+----
+
+Multiple metadata servers can be created in a cluster. But with the default
+settings only one can be active at any time. If an MDS, or its node, becomes
+unresponsive (or crashes), another `standby` MDS will get promoted to `active`.
+One can speed up the hand-over between the active and a standby MDS up by using
+the 'hotstandby' parameter option on create, or if you have already created it
+you may set/add:
+
+----
+mds standby replay = true
+----
+
+in the ceph.conf respective MDS section. With this enabled, this specific MDS
+will always poll the active one, so that it can take over faster as it is in a
+`warm' state. But naturally, the active polling will cause some additional
+performance impact on your system and active `MDS`.
+
+Multiple Active MDS
+^^^^^^^^^^^^^^^^^^^
+
+Since Luminous (12.2.x) you can also have multiple active metadata servers
+running, but this is normally only useful for a high count on parallel clients,
+as else the `MDS` seldom is the bottleneck. If you want to set this up please
+refer to the ceph documentation. footnote:[Configuring multiple active MDS
+daemons http://docs.ceph.com/docs/mimic/cephfs/multimds/]
+
+[[pveceph_fs_create]]
+Create a CephFS
+~~~~~~~~~~~~~~~
+
+With {pve}'s CephFS integration into you can create a CephFS easily over the
+Web GUI, the CLI or an external API interface. Some prerequisites are required
+for this to work:
+
+.Prerequisites for a successful CephFS setup:
+- xref:pve_ceph_install[Install Ceph packages], if this was already done some
+  time ago you might want to rerun it on an up to date system to ensure that
+  also all CephFS related packages get installed.
+- xref:pve_ceph_monitors[Setup Monitors]
+- xref:pve_ceph_monitors[Setup your OSDs]
+- xref:pveceph_fs_mds[Setup at least one MDS]
+
+After this got all checked and done you can simply create a CephFS through
+either the Web GUI's `Node -> CephFS` panel or the command line tool `pveceph`,
+for example with:
+
+----
+pveceph fs create --pg_num 128 --add-storage
+----
+
+This creates a CephFS named `'cephfs'' using a pool for its data named
+`'cephfs_data'' with `128` placement groups and a pool for its metadata named
+`'cephfs_metadata'' with one quarter of the data pools placement groups (`32`).
+Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
+Ceph documentation for more information regarding a fitting placement group
+number (`pg_num`) for your setup footnote:[Ceph Placement Groups
+http://docs.ceph.com/docs/mimic/rados/operations/placement-groups/].
+Additionally, the `'--add-storage'' parameter will add the CephFS to the {pve}
+storage configuration after it was created successfully.
+
+Destroy CephFS
+~~~~~~~~~~~~~~
+
+WARN: Destroying a CephFS will render all its data unusable, this cannot be
+undone!
+
+If you really want to destroy an existing CephFS you first need to stop, or
+destroy, all metadata server (`M̀DS`). You can destroy them either over the Web
+GUI or the command line interface, with:
+
+----
+pveceph mds destroy NAME
+----
+on each {pve} node hosting a MDS daemon.
+
+Then, you can remove (destroy) CephFS by issuing a:
+
+----
+ceph rm fs NAME --yes-i-really-mean-it
+----
+on a single node hosting Ceph. After this you may want to remove the created
+data and metadata pools, this can be done either over the Web GUI or the CLI
+with:
+
+----
+pveceph pool destroy NAME
+----
  
  ifdef::manvolnum[]
  include::pve-copyright.adoc[]