-----------
endif::manvolnum[]
ifndef::manvolnum[]
-Manage Ceph Services on Proxmox VE Nodes
-========================================
+Deploy Hyper-Converged Ceph Cluster
+===================================
:pve-toplevel:
endif::manvolnum[]
To simplify management, we provide 'pveceph' - a tool to install and
manage {ceph} services on {pve} nodes.
-.Ceph consists of a couple of Daemons footnote:[Ceph intro https://docs.ceph.com/docs/{ceph_codename}/start/intro/], for use as a RBD storage:
+.Ceph consists of a couple of Daemons, for use as a RBD storage:
- Ceph Monitor (ceph-mon)
- Ceph Manager (ceph-mgr)
- Ceph OSD (ceph-osd; Object Storage Daemon)
-TIP: We highly recommend to get familiar with Ceph's architecture
-footnote:[Ceph architecture https://docs.ceph.com/docs/{ceph_codename}/architecture/]
+TIP: We highly recommend to get familiar with Ceph
+footnote:[Ceph intro {cephdocs-url}/start/intro/],
+its architecture
+footnote:[Ceph architecture {cephdocs-url}/architecture/]
and vocabulary
-footnote:[Ceph glossary https://docs.ceph.com/docs/{ceph_codename}/glossary].
+footnote:[Ceph glossary {cephdocs-url}/glossary].
Precondition
three (preferably) identical servers for the setup.
Check also the recommendations from
-https://docs.ceph.com/docs/{ceph_codename}/start/hardware-recommendations/[Ceph's website].
+{cephdocs-url}/start/hardware-recommendations/[Ceph's website].
.CPU
Higher CPU core frequency reduce latency and should be preferred. As a simple
.Memory
Especially in a hyper-converged setup, the memory consumption needs to be
carefully monitored. In addition to the intended workload from virtual machines
-and container, Ceph needs enough memory available to provide good and stable
-performance. As a rule of thumb, for roughly 1 TiB of data, 1 GiB of memory
-will be used by an OSD. OSD caching will use additional memory.
+and containers, Ceph needs enough memory available to provide excellent and
+stable performance.
+
+As a rule of thumb, for roughly **1 TiB of data, 1 GiB of memory** will be used
+by an OSD. Especially during recovery, rebalancing or backfilling.
+
+The daemon itself will use additional memory. The Bluestore backend of the
+daemon requires by default **3-5 GiB of memory** (adjustable). In contrast, the
+legacy Filestore backend uses the OS page cache and the memory consumption is
+generally related to PGs of an OSD daemon.
.Network
We recommend a network bandwidth of at least 10 GbE or more, which is used
Further, estimate your bandwidth needs. While one HDD might not saturate a 1 Gb
link, multiple HDD OSDs per node can, and modern NVMe SSDs will even saturate
-10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwith
+10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwidth
will ensure that it isn't your bottleneck and won't be anytime soon, 25, 40 or
even 100 GBps are possible.
Ceph Monitor
-----------
The Ceph Monitor (MON)
-footnote:[Ceph Monitor https://docs.ceph.com/docs/{ceph_codename}/start/intro/]
+footnote:[Ceph Monitor {cephdocs-url}/start/intro/]
maintains a master copy of the cluster map. For high availability you need to
have at least 3 monitors. One monitor will already be installed if you
used the installation wizard. You won't need more than 3 monitors as long
------------
The Manager daemon runs alongside the monitors. It provides an interface to
monitor the cluster. Since the Ceph luminous release at least one ceph-mgr
-footnote:[Ceph Manager https://docs.ceph.com/docs/{ceph_codename}/mgr/] daemon is
+footnote:[Ceph Manager {cephdocs-url}/mgr/] daemon is
required.
-[i[pveceph_create_mgr]]
+[[pveceph_create_mgr]]
Create Manager
~~~~~~~~~~~~~~
----
You can directly choose the size for those with the '-db_size' and '-wal_size'
-paremeters respectively. If they are not given the following values (in order)
+parameters respectively. If they are not given the following values (in order)
will be used:
* bluestore_block_{db,wal}_size from ceph configuration...
NOTE: The default number of PGs works for 2-5 disks. Ceph throws a
'HEALTH_WARNING' if you have too few or too many PGs in your cluster.
+WARNING: **Do not set a min_size of 1**. A replicated pool with min_size of 1
+allows I/O on an object when it has only 1 replica which could lead to data
+loss, incomplete PGs or unfound objects.
+
It is advised to calculate the PG number depending on your setup, you can find
the formula and the PG calculator footnote:[PG calculator
-https://ceph.com/pgcalc/] online. While PGs can be increased later on, they can
-never be decreased.
+https://ceph.com/pgcalc/] online. From Ceph Nautilus onwards it is possible to
+increase and decrease the number of PGs later on footnote:[Placement Groups
+{cephdocs-url}/rados/operations/placement-groups/].
You can create pools through command line or on the GUI on each PVE host under
Further information on Ceph pool handling can be found in the Ceph pool
operation footnote:[Ceph pool operation
-https://docs.ceph.com/docs/{ceph_codename}/rados/operations/pools/]
+{cephdocs-url}/rados/operations/pools/]
manual.
OSDs, buckets (device locations) and rulesets (data replication) for pools.
NOTE: Further information can be found in the Ceph documentation, under the
-section CRUSH map footnote:[CRUSH map https://docs.ceph.com/docs/{ceph_codename}/rados/operations/crush-map/].
+section CRUSH map footnote:[CRUSH map {cephdocs-url}/rados/operations/crush-map/].
This map can be altered to reflect different replication hierarchies. The object
replicas can be separated (eg. failure domains), while maintaining the desired
running, but this is normally only useful for a high count on parallel clients,
as else the `MDS` seldom is the bottleneck. If you want to set this up please
refer to the ceph documentation. footnote:[Configuring multiple active MDS
-daemons https://docs.ceph.com/docs/{ceph_codename}/cephfs/multimds/]
+daemons {cephdocs-url}/cephfs/multimds/]
[[pveceph_fs_create]]
Create CephFS
Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
Ceph documentation for more information regarding a fitting placement group
number (`pg_num`) for your setup footnote:[Ceph Placement Groups
-https://docs.ceph.com/docs/{ceph_codename}/rados/operations/placement-groups/].
+{cephdocs-url}/rados/operations/placement-groups/].
Additionally, the `'--add-storage'' parameter will add the CephFS to the {pve}
storage configuration after it was created successfully.
~~~~~~~~~~~~
It is a good measure to run 'fstrim' (discard) regularly on VMs or containers.
This releases data blocks that the filesystem isn’t using anymore. It reduces
-data usage and the resource load. Most modern operating systems issue such
-discard commands to their disks regurarly. You only need to ensure that the
-Virtual Machines enable the xref:qm_hard_disk_discard[disk discard option].
+data usage and resource load. Most modern operating systems issue such discard
+commands to their disks regularly. You only need to ensure that the Virtual
+Machines enable the xref:qm_hard_disk_discard[disk discard option].
[[pveceph_scrub]]
Scrub & Deep Scrub
cheap metadata checks and weekly deep data checks. The weekly deep scrub reads
the objects and uses checksums to ensure data integrity. If a running scrub
interferes with business (performance) needs, you can adjust the time when
-scrubs footnote:[Ceph scrubbing https://docs.ceph.com/docs/{ceph_codename}/rados/configuration/osd-config-ref/#scrubbing]
+scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-ref/#scrubbing]
are executed.
Ceph monitoring and troubleshooting
-----------------------------------
-A good start is to continuosly monitor the ceph health from the start of
+A good start is to continuously monitor the ceph health from the start of
initial deployment. Either through the ceph tools itself, but also by accessing
the status through the {pve} link:api-viewer/index.html[API].
To get a more detailed view, every ceph service has a log file under
`/var/log/ceph/` and if there is not enough detail, the log level can be
-adjusted footnote:[Ceph log and debugging https://docs.ceph.com/docs/{ceph_codename}/rados/troubleshooting/log-and-debug/].
+adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
You can find more information about troubleshooting
-footnote:[Ceph troubleshooting https://docs.ceph.com/docs/{ceph_codename}/rados/troubleshooting/]
+footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]
a Ceph cluster on the official website.