X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pveceph.adoc;h=0c0184c8d855a64816eb74091a0f7709a59aa845;hp=18582e1942d17ab233f9c016133eb07262d829e9;hb=2996c790965398a50a764acb1cf6d70d491e6729;hpb=6ff32926fc70427083c203efb1d8021af48c4cba

diff --git a/pveceph.adoc b/pveceph.adoc
index 18582e1..0c0184c 100644
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -72,16 +72,62 @@ footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary].
 Precondition
 ------------
 
-To build a Proxmox Ceph Cluster there should be at least three (preferably)
-identical servers for the setup.
-
-A 10Gb network, exclusively used for Ceph, is recommended. A meshed network
-setup is also an option if there are no 10Gb switches available, see our wiki
-article footnote:[Full Mesh Network for Ceph {webwiki-url}Full_Mesh_Network_for_Ceph_Server] .
+To build a hyper-converged Proxmox + Ceph Cluster there should be at least
+three (preferably) identical servers for the setup.
 
 Check also the recommendations from
 http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website].
 
+.CPU
+Higher CPU core frequency reduce latency and should be preferred. As a simple
+rule of thumb, you should assign a CPU core (or thread) to each Ceph service to
+provide enough resources for stable and durable Ceph performance.
+
+.Memory
+Especially in a hyper-converged setup, the memory consumption needs to be
+carefully monitored. In addition to the intended workload from virtual machines
+and container, Ceph needs enough memory available to provide good and stable
+performance. As a rule of thumb, for roughly 1 TiB of data, 1 GiB of memory
+will be used by an OSD. OSD caching will use additional memory.
+
+.Network
+We recommend a network bandwidth of at least 10 GbE or more, which is used
+exclusively for Ceph. A meshed network setup
+footnote:[Full Mesh Network for Ceph {webwiki-url}Full_Mesh_Network_for_Ceph_Server]
+is also an option if there are no 10 GbE switches available.
+
+The volume of traffic, especially during recovery, will interfere with other
+services on the same network and may even break the {pve} cluster stack.
+
+Further, estimate your bandwidth needs. While one HDD might not saturate a 1 Gb
+link, multiple HDD OSDs per node can, and modern NVMe SSDs will even saturate
+10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwith
+will ensure that it isn't your bottleneck and won't be anytime soon, 25, 40 or
+even 100 GBps are possible.
+
+.Disks
+When planning the size of your Ceph cluster, it is important to take the
+recovery time into consideration. Especially with small clusters, the recovery
+might take long. It is recommended that you use SSDs instead of HDDs in small
+setups to reduce recovery time, minimizing the likelihood of a subsequent
+failure event during recovery.
+
+In general SSDs will provide more IOPs than spinning disks. This fact and the
+higher cost may make a xref:pve_ceph_device_classes[class based] separation of
+pools appealing. Another possibility to speedup OSDs is to use a faster disk
+as journal or DB/WAL device, see xref:pve_ceph_osds[creating Ceph OSDs]. If a
+faster disk is used for multiple OSDs, a proper balance between OSD and WAL /
+DB (or journal) disk must be selected, otherwise the faster disk becomes the
+bottleneck for all linked OSDs.
+
+Aside from the disk type, Ceph best performs with an even sized and distributed
+amount of disks per node. For example, 4 x 500 GB disks with in each node is
+better than a mixed setup with a single 1 TB and three 250 GB disk.
+
+One also need to balance OSD count and single OSD capacity. More capacity
+allows to increase storage density, but it also means that a single OSD
+failure forces ceph to recover more data at once.
+
 .Avoid RAID
 As Ceph handles data object redundancy and multiple parallel writes to disks
 (OSDs) on its own, using a RAID controller normally doesnât improve
@@ -93,12 +139,69 @@ the ones from Ceph.
 
 WARNING: Avoid RAID controller, use host bus adapter (HBA) instead.
 
+NOTE: Above recommendations should be seen as a rough guidance for choosing
+hardware. Therefore, it is still essential to adapt it to your specific needs,
+test your setup and monitor health and performance continuously.
+
+[[pve_ceph_install_wizard]]
+Initial Ceph installation & configuration
+-----------------------------------------
+
+[thumbnail="screenshot/gui-node-ceph-install.png"]
+
+With {pve} you have the benefit of an easy to use installation wizard
+for Ceph. Click on one of your cluster nodes and navigate to the Ceph
+section in the menu tree. If Ceph is not already installed you will be
+offered to do so now.
+
+The wizard is divided into different sections, where each needs to be
+finished successfully in order to use Ceph. After starting the installation
+the wizard will download and install all required packages from {pve}'s ceph
+repository.
+
+After finishing the first step, you will need to create a configuration.
+This step is only needed once per cluster, as this configuration is distributed
+automatically to all remaining cluster members through {pve}'s clustered
+xref:chapter_pmxcfs[configuration file system (pmxcfs)].
+
+The configuration step includes the following settings:
+
+* *Public Network:* You should setup a dedicated network for Ceph, this
+setting is required. Separating your Ceph traffic is highly recommended,
+because it could lead to troubles with other latency dependent services,
+e.g., cluster communication may decrease Ceph's performance, if not done.
+
+[thumbnail="screenshot/gui-node-ceph-install-wizard-step2.png"]
+
+* *Cluster Network:* As an optional step you can go even further and
+separate the xref:pve_ceph_osds[OSD] replication & heartbeat traffic
+as well. This will relieve the public network and could lead to
+significant performance improvements especially in big clusters.
+
+You have two more options which are considered advanced and therefore
+should only changed if you are an expert.
+
+* *Number of replicas*: Defines the how often a object is replicated
+* *Minimum replicas*: Defines the minimum number of required replicas
+  for I/O to be marked as complete.
+
+Additionally you need to choose your first monitor node, this is required.
+
+That's it, you should see a success page as the last step with further
+instructions on how to go on. You are now prepared to start using Ceph,
+even though you will need to create additional xref:pve_ceph_monitors[monitors],
+create some xref:pve_ceph_osds[OSDs] and at least one xref:pve_ceph_pools[pool].
+
+The rest of this chapter will guide you on how to get the most out of
+your {pve} based Ceph setup, this will include aforementioned and
+more like xref:pveceph_fs[CephFS] which is a very handy addition to your
+new Ceph cluster.
 
 [[pve_ceph_install]]
 Installation of Ceph Packages
 -----------------------------
-
-On each node run the installation script as follows:
+Use {pve} Ceph installation wizard (recommended) or run the following
+command on each node:
 
 [source,bash]
 ----
@@ -114,20 +217,20 @@ Creating initial Ceph configuration
 
 [thumbnail="screenshot/gui-ceph-config.png"]
 
-After installation of packages, you need to create an initial Ceph
-configuration on just one node, based on your network (`10.10.10.0/24`
-in the following example) dedicated for Ceph:
+Use the {pve} Ceph installation wizard (recommended) or run the
+following command on one node:
 
 [source,bash]
 ----
 pveceph init --network 10.10.10.0/24
 ----
 
-This creates an initial configuration at `/etc/pve/ceph.conf`. That file is
-automatically distributed to all {pve} nodes by using
-xref:chapter_pmxcfs[pmxcfs]. The command also creates a symbolic link
-from `/etc/ceph/ceph.conf` pointing to that file. So you can simply run
-Ceph commands without the need to specify a configuration file.
+This creates an initial configuration at `/etc/pve/ceph.conf` with a
+dedicated network for ceph. That file is automatically distributed to
+all {pve} nodes by using xref:chapter_pmxcfs[pmxcfs]. The command also
+creates a symbolic link from `/etc/ceph/ceph.conf` pointing to that file.
+So you can simply run Ceph commands without the need to specify a
+configuration file.
 
 
 [[pve_ceph_monitors]]
@@ -139,7 +242,10 @@ Creating Ceph Monitors
 The Ceph Monitor (MON)
 footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
 maintains a master copy of the cluster map. For high availability you need to
-have at least 3 monitors.
+have at least 3 monitors. One monitor will already be installed if you
+used the installation wizard. You wont need more than 3 monitors as long
+as your cluster is small to midsize, only really large clusters will
+need more than that.
 
 On each node where you want to place a monitor (three monitors are recommended),
 create it by using the 'Ceph -> Monitor' tab in the GUI or run.
@@ -316,6 +422,7 @@ operation footnote:[Ceph pool operation
 http://docs.ceph.com/docs/luminous/rados/operations/pools/]
 manual.
 
+[[pve_ceph_device_classes]]
 Ceph CRUSH & device classes
 ---------------------------
 The foundation of Ceph is its algorithm, **C**ontrolled **R**eplication
@@ -432,7 +539,7 @@ cluster, this way even high load will not overload a single host, which can be
 an issue with traditional shared filesystem approaches, like `NFS`, for
 example.
 
-{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage])
+{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage]
 to save backups, ISO files or container templates and creating a
 hyper-converged CephFS itself.
 
@@ -537,18 +644,22 @@ pveceph pool destroy NAME
 ----
 
 
-Ceph monitoring & troubleshooting
----------------------------------
-A good start is to monitor the ceph health to begin with. Either through the
-ceph tools itself or also by accessing the status through the {pve}
-link:api-viewer/index.html[API].
+Ceph monitoring and troubleshooting
+-----------------------------------
+A good start is to continuosly monitor the ceph health from the start of
+initial deployment. Either through the ceph tools itself, but also by accessing
+the status through the {pve} link:api-viewer/index.html[API].
 
-If the cluster is in an unhealthy state the commands below will give an
-overview on the current events.
+The following ceph commands below can be used to see if the cluster is healthy
+('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
+('HEALTH_ERR'). If the cluster is in an unhealthy state the status commands
+below will also give you an overview on the current events and actions take.
 
 ----
-ceph -s
-ceph -w
+# single time output
+pve# ceph -s
+# continuously output status changes (press CTRL+C to stop)
+pve# ceph -w
 ----
 
 To get a more detailed view, every ceph service has a log file under