fix #844: document first VM/CT start-up delay

[pve-docs.git] / pveceph.adoc
diff --git a/pveceph.adoc b/pveceph.adoc

index 32e75531df53cf5824ddb75cdbf088bdc83e7b9c..aa7a20f3cd3bb7b73efc0c20e371796bb31f7e3c 100644 (file)
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -23,7 +23,7 @@ Deploy Hyper-Converged Ceph Cluster
  :pve-toplevel:
  endif::manvolnum[]
  
-[thumbnail="screenshot/gui-ceph-status.png"]
+[thumbnail="screenshot/gui-ceph-status-dashboard.png"]
  
  {pve} unifies your compute and storage systems, that is, you can use the same
  physical nodes within a cluster for both computing (processing VMs and
@@ -92,7 +92,7 @@ machines and containers, you must also account for having enough memory
  available for Ceph to provide excellent and stable performance.
  
  As a rule of thumb, for roughly **1 TiB of data, 1 GiB of memory** will be used
-by an OSD. Especially during recovery, rebalancing or backfilling.
+by an OSD. Especially during recovery, re-balancing or backfilling.
  
  The daemon itself will use additional memory. The Bluestore backend of the
  daemon requires by default **3-5 GiB of memory** (adjustable). In contrast, the
@@ -121,12 +121,13 @@ might take long. It is recommended that you use SSDs instead of HDDs in small
  setups to reduce recovery time, minimizing the likelihood of a subsequent
  failure event during recovery.
  
-In general SSDs will provide more IOPs than spinning disks. With this in mind,
+In general, SSDs will provide more IOPS than spinning disks. With this in mind,
  in addition to the higher cost, it may make sense to implement a
  xref:pve_ceph_device_classes[class based] separation of pools. Another way to
  speed up OSDs is to use a faster disk as a journal or
-DB/**W**rite-**A**head-**L**og device, see xref:pve_ceph_osds[creating Ceph
-OSDs]. If a faster disk is used for multiple OSDs, a proper balance between OSD
+DB/**W**rite-**A**head-**L**og device, see
+xref:pve_ceph_osds[creating Ceph OSDs].
+If a faster disk is used for multiple OSDs, a proper balance between OSD
  and WAL / DB (or journal) disk must be selected, otherwise the faster disk
  becomes the bottleneck for all linked OSDs.
  
@@ -157,6 +158,9 @@ You should test your setup and monitor health and performance continuously.
  Initial Ceph Installation & Configuration
  -----------------------------------------
  
+Using the Web-based Wizard
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
  [thumbnail="screenshot/gui-node-ceph-install.png"]
  
  With {pve} you have the benefit of an easy to use installation wizard
@@ -165,11 +169,17 @@ section in the menu tree. If Ceph is not already installed, you will see a
  prompt offering to do so.
  
  The wizard is divided into multiple sections, where each needs to
-finish successfully, in order to use Ceph. After starting the installation,
-the wizard will download and install all the required packages from {pve}'s Ceph
-repository.
+finish successfully, in order to use Ceph.
+
+First you need to chose which Ceph version you want to install. Prefer the one
+from your other nodes, or the newest if this is the first node you install
+Ceph.
  
-After finishing the first step, you will need to create a configuration.
+After starting the installation, the wizard will download and install all the
+required packages from {pve}'s Ceph repository.
+[thumbnail="screenshot/gui-node-ceph-install-wizard-step0.png"]
+
+After finishing the installation step, you will need to create a configuration.
  This step is only needed once per cluster, as this configuration is distributed
  automatically to all remaining cluster members through {pve}'s clustered
  xref:chapter_pmxcfs[configuration file system (pmxcfs)].
@@ -208,10 +218,11 @@ more, such as xref:pveceph_fs[CephFS], which is a helpful addition to your
  new Ceph cluster.
  
  [[pve_ceph_install]]
-Installation of Ceph Packages
------------------------------
-Use the {pve} Ceph installation wizard (recommended) or run the following
-command on each node:
+CLI Installation of Ceph Packages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Alternatively to the the recommended {pve}  Ceph installation wizard available
+in the web-interface, you can use the following CLI command on each node:
  
  [source,bash]
  ----
@@ -222,10 +233,8 @@ This sets up an `apt` package repository in
  `/etc/apt/sources.list.d/ceph.list` and installs the required software.
  
  
-Create initial Ceph configuration
----------------------------------
-
-[thumbnail="screenshot/gui-ceph-config.png"]
+Initial Ceph configuration via CLI
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
  Use the {pve} Ceph installation wizard (recommended) or run the
  following command on one node:
@@ -246,6 +255,9 @@ configuration file.
  [[pve_ceph_monitors]]
  Ceph Monitor
  -----------
+
+[thumbnail="screenshot/gui-ceph-monitor.png"]
+
  The Ceph Monitor (MON)
  footnote:[Ceph Monitor {cephdocs-url}/start/intro/]
  maintains a master copy of the cluster map. For high availability, you need at
@@ -254,13 +266,10 @@ used the installation wizard. You won't need more than 3 monitors, as long
  as your cluster is small to medium-sized. Only really large clusters will
  require more than this.
  
-
  [[pveceph_create_mon]]
  Create Monitors
  ~~~~~~~~~~~~~~~
  
-[thumbnail="screenshot/gui-ceph-monitor.png"]
-
  On each node where you want to place a monitor (three monitors are recommended),
  create one by using the 'Ceph -> Monitor' tab in the GUI or run:
  
@@ -335,17 +344,16 @@ telemetry and more.
  [[pve_ceph_osds]]
  Ceph OSDs
  ---------
+
+[thumbnail="screenshot/gui-ceph-osd-status.png"]
+
  Ceph **O**bject **S**torage **D**aemons store objects for Ceph over the
  network. It is recommended to use one OSD per physical disk.
  
-NOTE: By default an object is 4 MiB in size.
-
  [[pve_ceph_osd_create]]
  Create OSDs
  ~~~~~~~~~~~
  
-[thumbnail="screenshot/gui-ceph-osd-status.png"]
-
  You can create an OSD either via the {pve} web-interface or via the CLI using
  `pveceph`. For example:
  
@@ -406,7 +414,6 @@ NOTE: The DB stores BlueStore’s internal metadata, and the WAL is BlueStore’
  internal journal or write-ahead log. It is recommended to use a fast SSD or
  NVRAM for better performance.
  
-
  .Ceph Filestore
  
  Before Ceph Luminous, Filestore was used as the default storage type for Ceph OSDs.
@@ -455,6 +462,9 @@ WARNING: The above command will destroy all data on the disk!
  [[pve_ceph_pools]]
  Ceph Pools
  ----------
+
+[thumbnail="screenshot/gui-ceph-pools.png"]
+
  A pool is a logical group for storing objects. It holds a collection of objects,
  known as **P**lacement **G**roups (`PG`, `pg_num`).
  
@@ -462,10 +472,8 @@ known as **P**lacement **G**roups (`PG`, `pg_num`).
  Create and Edit Pools
  ~~~~~~~~~~~~~~~~~~~~~
  
-You can create pools from the command line or the web-interface of any {pve}
-host under **Ceph -> Pools**.
-
-[thumbnail="screenshot/gui-ceph-pools.png"]
+You can create and edit pools from the command line or the web-interface of any
+{pve} host under **Ceph -> Pools**.
  
  When no options are given, we set a default of **128 PGs**, a **size of 3
  replicas** and a **min_size of 2 replicas**, to ensure no data loss occurs if
@@ -475,16 +483,18 @@ WARNING: **Do not set a min_size of 1**. A replicated pool with min_size of 1
  allows I/O on an object when it has only 1 replica, which could lead to data
  loss, incomplete PGs or unfound objects.
  
-It is advised that you calculate the PG number based on your setup. You can
-find the formula and the PG calculator footnote:[PG calculator
-https://ceph.com/pgcalc/] online. From Ceph Nautilus onward, you can change the
-number of PGs footnoteref:[placement_groups,Placement Groups
+It is advised that you either enable the PG-Autoscaler or calculate the PG
+number based on your setup. You can find the formula and the PG calculator
+footnote:[PG calculator https://web.archive.org/web/20210301111112/http://ceph.com/pgcalc/] online. From Ceph Nautilus
+onward, you can change the number of PGs
+footnoteref:[placement_groups,Placement Groups
  {cephdocs-url}/rados/operations/placement-groups/] after the setup.
  
-In addition to manual adjustment, the PG autoscaler
-footnoteref:[autoscaler,Automated Scaling
+The PG autoscaler footnoteref:[autoscaler,Automated Scaling
  {cephdocs-url}/rados/operations/placement-groups/#automated-scaling] can
-automatically scale the PG count for a pool in the background.
+automatically scale the PG count for a pool in the background. Setting the
+`Target Size` or `Target Ratio` advanced parameters helps the PG-Autoscaler to
+make better decisions.
  
  .Example for creating a pool over the CLI
  [source,bash]
@@ -496,7 +506,14 @@ TIP: If you would also like to automatically define a storage for your
  pool, keep the `Add as Storage' checkbox checked in the web-interface, or use the
  command line option '--add_storages' at pool creation.
  
-.Base Options
+Pool Options
+^^^^^^^^^^^^
+
+[thumbnail="screenshot/gui-ceph-pool-create.png"]
+
+The following options are available on pool creation, and partially also when
+editing a pool.
+
  Name:: The name of the pool. This must be unique and can't be changed afterwards.
  Size:: The number of replicas per object. Ceph always tries to have this many
  copies of an object. Default: `3`.
@@ -515,7 +532,7 @@ xref:pve_ceph_device_classes[Ceph CRUSH & device classes] for information on
  device-based rules.
  # of PGs:: The number of placement groups footnoteref:[placement_groups] that
  the pool should have at the beginning. Default: `128`.
-Target Size Ratio:: The ratio of data that is expected in the pool. The PG
+Target Ratio:: The ratio of data that is expected in the pool. The PG
  autoscaler uses the ratio relative to other ratio sets. It takes precedence
  over the `target size` if both are set.
  Target Size:: The estimated amount of data expected in the pool. The PG
@@ -555,6 +572,7 @@ PG Autoscaler
  
  The PG autoscaler allows the cluster to consider the amount of (expected) data
  stored in each pool and to choose the appropriate pg_num values automatically.
+It is available since Ceph Nautilus.
  
  You may need to activate the PG autoscaler module before adjustments can take
  effect.
@@ -589,6 +607,9 @@ Nautilus: PG merging and autotuning].
  [[pve_ceph_device_classes]]
  Ceph CRUSH & device classes
  ---------------------------
+
+[thumbnail="screenshot/gui-ceph-config.png"]
+
  The footnote:[CRUSH
  https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf] (**C**ontrolled
  **R**eplication **U**nder **S**calable **H**ashing) algorithm is at the
@@ -602,7 +623,7 @@ NOTE: Further information can be found in the Ceph documentation, under the
  section CRUSH map footnote:[CRUSH map {cephdocs-url}/rados/operations/crush-map/].
  
  This map can be altered to reflect different replication hierarchies. The object
-replicas can be separated (eg. failure domains), while maintaining the desired
+replicas can be separated (e.g., failure domains), while maintaining the desired
  distribution.
  
  A common configuration is to use different classes of disks for different Ceph
@@ -651,7 +672,7 @@ ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class
  |<rule-name>|name of the rule, to connect with a pool (seen in GUI & CLI)
  |<root>|which crush root it should belong to (default ceph root "default")
  |<failure-domain>|at which failure-domain the objects should be distributed (usually host)
-|<class>|what type of OSD backing store to use (eg. nvme, ssd, hdd)
+|<class>|what type of OSD backing store to use (e.g., nvme, ssd, hdd)
  |===
  
  Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.
@@ -673,8 +694,8 @@ Ceph Client
  
  Following the setup from the previous sections, you can configure {pve} to use
  such pools to store VM and Container images. Simply use the GUI to add a new
-`RBD` storage (see section xref:ceph_rados_block_devices[Ceph RADOS Block
-Devices (RBD)]).
+`RBD` storage (see section
+xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
  
  You also need to copy the keyring to a predefined location for an external Ceph
  cluster. If Ceph is installed on the Proxmox nodes itself, then this will be