X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pveceph.adoc;h=0c0184c8d855a64816eb74091a0f7709a59aa845;hp=18582e1942d17ab233f9c016133eb07262d829e9;hb=2996c790965398a50a764acb1cf6d70d491e6729;hpb=6ff32926fc70427083c203efb1d8021af48c4cba diff --git a/pveceph.adoc b/pveceph.adoc index 18582e1..0c0184c 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -72,16 +72,62 @@ footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary]. Precondition ------------ -To build a Proxmox Ceph Cluster there should be at least three (preferably) -identical servers for the setup. - -A 10Gb network, exclusively used for Ceph, is recommended. A meshed network -setup is also an option if there are no 10Gb switches available, see our wiki -article footnote:[Full Mesh Network for Ceph {webwiki-url}Full_Mesh_Network_for_Ceph_Server] . +To build a hyper-converged Proxmox + Ceph Cluster there should be at least +three (preferably) identical servers for the setup. Check also the recommendations from http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website]. +.CPU +Higher CPU core frequency reduce latency and should be preferred. As a simple +rule of thumb, you should assign a CPU core (or thread) to each Ceph service to +provide enough resources for stable and durable Ceph performance. + +.Memory +Especially in a hyper-converged setup, the memory consumption needs to be +carefully monitored. In addition to the intended workload from virtual machines +and container, Ceph needs enough memory available to provide good and stable +performance. As a rule of thumb, for roughly 1 TiB of data, 1 GiB of memory +will be used by an OSD. OSD caching will use additional memory. + +.Network +We recommend a network bandwidth of at least 10 GbE or more, which is used +exclusively for Ceph. A meshed network setup +footnote:[Full Mesh Network for Ceph {webwiki-url}Full_Mesh_Network_for_Ceph_Server] +is also an option if there are no 10 GbE switches available. + +The volume of traffic, especially during recovery, will interfere with other +services on the same network and may even break the {pve} cluster stack. + +Further, estimate your bandwidth needs. While one HDD might not saturate a 1 Gb +link, multiple HDD OSDs per node can, and modern NVMe SSDs will even saturate +10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwith +will ensure that it isn't your bottleneck and won't be anytime soon, 25, 40 or +even 100 GBps are possible. + +.Disks +When planning the size of your Ceph cluster, it is important to take the +recovery time into consideration. Especially with small clusters, the recovery +might take long. It is recommended that you use SSDs instead of HDDs in small +setups to reduce recovery time, minimizing the likelihood of a subsequent +failure event during recovery. + +In general SSDs will provide more IOPs than spinning disks. This fact and the +higher cost may make a xref:pve_ceph_device_classes[class based] separation of +pools appealing. Another possibility to speedup OSDs is to use a faster disk +as journal or DB/WAL device, see xref:pve_ceph_osds[creating Ceph OSDs]. If a +faster disk is used for multiple OSDs, a proper balance between OSD and WAL / +DB (or journal) disk must be selected, otherwise the faster disk becomes the +bottleneck for all linked OSDs. + +Aside from the disk type, Ceph best performs with an even sized and distributed +amount of disks per node. For example, 4 x 500 GB disks with in each node is +better than a mixed setup with a single 1 TB and three 250 GB disk. + +One also need to balance OSD count and single OSD capacity. More capacity +allows to increase storage density, but it also means that a single OSD +failure forces ceph to recover more data at once. + .Avoid RAID As Ceph handles data object redundancy and multiple parallel writes to disks (OSDs) on its own, using a RAID controller normally doesn’t improve @@ -93,12 +139,69 @@ the ones from Ceph. WARNING: Avoid RAID controller, use host bus adapter (HBA) instead. +NOTE: Above recommendations should be seen as a rough guidance for choosing +hardware. Therefore, it is still essential to adapt it to your specific needs, +test your setup and monitor health and performance continuously. + +[[pve_ceph_install_wizard]] +Initial Ceph installation & configuration +----------------------------------------- + +[thumbnail="screenshot/gui-node-ceph-install.png"] + +With {pve} you have the benefit of an easy to use installation wizard +for Ceph. Click on one of your cluster nodes and navigate to the Ceph +section in the menu tree. If Ceph is not already installed you will be +offered to do so now. + +The wizard is divided into different sections, where each needs to be +finished successfully in order to use Ceph. After starting the installation +the wizard will download and install all required packages from {pve}'s ceph +repository. + +After finishing the first step, you will need to create a configuration. +This step is only needed once per cluster, as this configuration is distributed +automatically to all remaining cluster members through {pve}'s clustered +xref:chapter_pmxcfs[configuration file system (pmxcfs)]. + +The configuration step includes the following settings: + +* *Public Network:* You should setup a dedicated network for Ceph, this +setting is required. Separating your Ceph traffic is highly recommended, +because it could lead to troubles with other latency dependent services, +e.g., cluster communication may decrease Ceph's performance, if not done. + +[thumbnail="screenshot/gui-node-ceph-install-wizard-step2.png"] + +* *Cluster Network:* As an optional step you can go even further and +separate the xref:pve_ceph_osds[OSD] replication & heartbeat traffic +as well. This will relieve the public network and could lead to +significant performance improvements especially in big clusters. + +You have two more options which are considered advanced and therefore +should only changed if you are an expert. + +* *Number of replicas*: Defines the how often a object is replicated +* *Minimum replicas*: Defines the minimum number of required replicas + for I/O to be marked as complete. + +Additionally you need to choose your first monitor node, this is required. + +That's it, you should see a success page as the last step with further +instructions on how to go on. You are now prepared to start using Ceph, +even though you will need to create additional xref:pve_ceph_monitors[monitors], +create some xref:pve_ceph_osds[OSDs] and at least one xref:pve_ceph_pools[pool]. + +The rest of this chapter will guide you on how to get the most out of +your {pve} based Ceph setup, this will include aforementioned and +more like xref:pveceph_fs[CephFS] which is a very handy addition to your +new Ceph cluster. [[pve_ceph_install]] Installation of Ceph Packages ----------------------------- - -On each node run the installation script as follows: +Use {pve} Ceph installation wizard (recommended) or run the following +command on each node: [source,bash] ---- @@ -114,20 +217,20 @@ Creating initial Ceph configuration [thumbnail="screenshot/gui-ceph-config.png"] -After installation of packages, you need to create an initial Ceph -configuration on just one node, based on your network (`10.10.10.0/24` -in the following example) dedicated for Ceph: +Use the {pve} Ceph installation wizard (recommended) or run the +following command on one node: [source,bash] ---- pveceph init --network 10.10.10.0/24 ---- -This creates an initial configuration at `/etc/pve/ceph.conf`. That file is -automatically distributed to all {pve} nodes by using -xref:chapter_pmxcfs[pmxcfs]. The command also creates a symbolic link -from `/etc/ceph/ceph.conf` pointing to that file. So you can simply run -Ceph commands without the need to specify a configuration file. +This creates an initial configuration at `/etc/pve/ceph.conf` with a +dedicated network for ceph. That file is automatically distributed to +all {pve} nodes by using xref:chapter_pmxcfs[pmxcfs]. The command also +creates a symbolic link from `/etc/ceph/ceph.conf` pointing to that file. +So you can simply run Ceph commands without the need to specify a +configuration file. [[pve_ceph_monitors]] @@ -139,7 +242,10 @@ Creating Ceph Monitors The Ceph Monitor (MON) footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/] maintains a master copy of the cluster map. For high availability you need to -have at least 3 monitors. +have at least 3 monitors. One monitor will already be installed if you +used the installation wizard. You wont need more than 3 monitors as long +as your cluster is small to midsize, only really large clusters will +need more than that. On each node where you want to place a monitor (three monitors are recommended), create it by using the 'Ceph -> Monitor' tab in the GUI or run. @@ -316,6 +422,7 @@ operation footnote:[Ceph pool operation http://docs.ceph.com/docs/luminous/rados/operations/pools/] manual. +[[pve_ceph_device_classes]] Ceph CRUSH & device classes --------------------------- The foundation of Ceph is its algorithm, **C**ontrolled **R**eplication @@ -432,7 +539,7 @@ cluster, this way even high load will not overload a single host, which can be an issue with traditional shared filesystem approaches, like `NFS`, for example. -{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage]) +{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage] to save backups, ISO files or container templates and creating a hyper-converged CephFS itself. @@ -537,18 +644,22 @@ pveceph pool destroy NAME ---- -Ceph monitoring & troubleshooting ---------------------------------- -A good start is to monitor the ceph health to begin with. Either through the -ceph tools itself or also by accessing the status through the {pve} -link:api-viewer/index.html[API]. +Ceph monitoring and troubleshooting +----------------------------------- +A good start is to continuosly monitor the ceph health from the start of +initial deployment. Either through the ceph tools itself, but also by accessing +the status through the {pve} link:api-viewer/index.html[API]. -If the cluster is in an unhealthy state the commands below will give an -overview on the current events. +The following ceph commands below can be used to see if the cluster is healthy +('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors +('HEALTH_ERR'). If the cluster is in an unhealthy state the status commands +below will also give you an overview on the current events and actions take. ---- -ceph -s -ceph -w +# single time output +pve# ceph -s +# continuously output status changes (press CTRL+C to stop) +pve# ceph -w ---- To get a more detailed view, every ceph service has a log file under