rework SDN docs a bit

[pve-docs.git] / pveceph.adoc
diff --git a/pveceph.adoc b/pveceph.adoc

index f6fe3fa38e17a62be8a6baf444d28c136aed5954..baf0988dd93c788337ed2dab26d898b35a606b15 100644 (file)
--- a/pveceph.adoc
+++ b/pveceph.adoc
@@ -18,8 +18,8 @@ DESCRIPTION
  -----------
  endif::manvolnum[]
  ifndef::manvolnum[]
  -----------
  endif::manvolnum[]
  ifndef::manvolnum[]
-Manage Ceph Services on Proxmox VE Nodes
-========================================
+Deploy Hyper-Converged Ceph Cluster
+===================================
  :pve-toplevel:
  endif::manvolnum[]
  
  :pve-toplevel:
  endif::manvolnum[]
  
@@ -58,15 +58,15 @@ and VMs on the same node is possible.
  To simplify management, we provide 'pveceph' - a tool to install and
  manage {ceph} services on {pve} nodes.
  
  To simplify management, we provide 'pveceph' - a tool to install and
  manage {ceph} services on {pve} nodes.
  
-.Ceph consists of a couple of Daemons footnote:[Ceph intro http://docs.ceph.com/docs/luminous/start/intro/], for use as a RBD storage:
+.Ceph consists of a couple of Daemons footnote:[Ceph intro https://docs.ceph.com/docs/{ceph_codename}/start/intro/], for use as a RBD storage:
  - Ceph Monitor (ceph-mon)
  - Ceph Manager (ceph-mgr)
  - Ceph OSD (ceph-osd; Object Storage Daemon)
  
  TIP: We highly recommend to get familiar with Ceph's architecture
  - Ceph Monitor (ceph-mon)
  - Ceph Manager (ceph-mgr)
  - Ceph OSD (ceph-osd; Object Storage Daemon)
  
  TIP: We highly recommend to get familiar with Ceph's architecture
-footnote:[Ceph architecture http://docs.ceph.com/docs/luminous/architecture/]
+footnote:[Ceph architecture https://docs.ceph.com/docs/{ceph_codename}/architecture/]
  and vocabulary
  and vocabulary
-footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary].
+footnote:[Ceph glossary https://docs.ceph.com/docs/{ceph_codename}/glossary].
  
  
  Precondition
  
  
  Precondition
@@ -76,7 +76,7 @@ To build a hyper-converged Proxmox + Ceph Cluster there should be at least
  three (preferably) identical servers for the setup.
  
  Check also the recommendations from
  three (preferably) identical servers for the setup.
  
  Check also the recommendations from
-http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website].
+https://docs.ceph.com/docs/{ceph_codename}/start/hardware-recommendations/[Ceph's website].
  
  .CPU
  Higher CPU core frequency reduce latency and should be preferred. As a simple
  
  .CPU
  Higher CPU core frequency reduce latency and should be preferred. As a simple
@@ -86,9 +86,16 @@ provide enough resources for stable and durable Ceph performance.
  .Memory
  Especially in a hyper-converged setup, the memory consumption needs to be
  carefully monitored. In addition to the intended workload from virtual machines
  .Memory
  Especially in a hyper-converged setup, the memory consumption needs to be
  carefully monitored. In addition to the intended workload from virtual machines
-and container, Ceph needs enough memory available to provide good and stable
-performance. As a rule of thumb, for roughly 1 TiB of data, 1 GiB of memory
-will be used by an OSD. OSD caching will use additional memory.
+and containers, Ceph needs enough memory available to provide excellent and
+stable performance.
+
+As a rule of thumb, for roughly **1 TiB of data, 1 GiB of memory** will be used
+by an OSD. Especially during recovery, rebalancing or backfilling.
+
+The daemon itself will use additional memory. The Bluestore backend of the
+daemon requires by default **3-5 GiB of memory** (adjustable). In contrast, the
+legacy Filestore backend uses the OS page cache and the memory consumption is
+generally related to PGs of an OSD daemon.
  
  .Network
  We recommend a network bandwidth of at least 10 GbE or more, which is used
  
  .Network
  We recommend a network bandwidth of at least 10 GbE or more, which is used
@@ -101,7 +108,7 @@ services on the same network and may even break the {pve} cluster stack.
  
  Further, estimate your bandwidth needs. While one HDD might not saturate a 1 Gb
  link, multiple HDD OSDs per node can, and modern NVMe SSDs will even saturate
  
  Further, estimate your bandwidth needs. While one HDD might not saturate a 1 Gb
  link, multiple HDD OSDs per node can, and modern NVMe SSDs will even saturate
-10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwith
+10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwidth
  will ensure that it isn't your bottleneck and won't be anytime soon, 25, 40 or
  even 100 GBps are possible.
  
  will ensure that it isn't your bottleneck and won't be anytime soon, 25, 40 or
  even 100 GBps are possible.
  
@@ -237,7 +244,7 @@ configuration file.
  Ceph Monitor
  -----------
  The Ceph Monitor (MON)
  Ceph Monitor
  -----------
  The Ceph Monitor (MON)
-footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
+footnote:[Ceph Monitor https://docs.ceph.com/docs/{ceph_codename}/start/intro/]
  maintains a master copy of the cluster map. For high availability you need to
  have at least 3 monitors. One monitor will already be installed if you
  used the installation wizard. You won't need more than 3 monitors as long
  maintains a master copy of the cluster map. For high availability you need to
  have at least 3 monitors. One monitor will already be installed if you
  used the installation wizard. You won't need more than 3 monitors as long
@@ -245,6 +252,7 @@ as your cluster is small to midsize, only really large clusters will
  need more than that.
  
  
  need more than that.
  
  
+[[pveceph_create_mon]]
  Create Monitors
  ~~~~~~~~~~~~~~~
  
  Create Monitors
  ~~~~~~~~~~~~~~~
  
@@ -259,7 +267,7 @@ create it by using the 'Ceph -> Monitor' tab in the GUI or run.
  pveceph mon create
  ----
  
  pveceph mon create
  ----
  
-
+[[pveceph_destroy_mon]]
  Destroy Monitors
  ~~~~~~~~~~~~~~~~
  
  Destroy Monitors
  ~~~~~~~~~~~~~~~~
  
@@ -282,9 +290,10 @@ Ceph Manager
  ------------
  The Manager daemon runs alongside the monitors. It provides an interface to
  monitor the cluster. Since the Ceph luminous release at least one ceph-mgr
  ------------
  The Manager daemon runs alongside the monitors. It provides an interface to
  monitor the cluster. Since the Ceph luminous release at least one ceph-mgr
-footnote:[Ceph Manager http://docs.ceph.com/docs/luminous/mgr/] daemon is
+footnote:[Ceph Manager https://docs.ceph.com/docs/{ceph_codename}/mgr/] daemon is
  required.
  
  required.
  
+[[pveceph_create_mgr]]
  Create Manager
  ~~~~~~~~~~~~~~
  
  Create Manager
  ~~~~~~~~~~~~~~
  
@@ -299,6 +308,7 @@ NOTE: It is recommended to install the Ceph Manager on the monitor nodes. For
  high availability install more then one manager.
  
  
  high availability install more then one manager.
  
  
+[[pveceph_destroy_mgr]]
  Destroy Manager
  ~~~~~~~~~~~~~~~
  
  Destroy Manager
  ~~~~~~~~~~~~~~~
  
@@ -355,7 +365,7 @@ WARNING: The above command will destroy data on the disk!
  
  Starting with the Ceph Kraken release, a new Ceph OSD storage type was
  introduced, the so called Bluestore
  
  Starting with the Ceph Kraken release, a new Ceph OSD storage type was
  introduced, the so called Bluestore
-footnote:[Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/].
+footnote:[Ceph Bluestore https://ceph.com/community/new-luminous-bluestore/].
  This is the default when creating OSDs since Ceph Luminous.
  
  [source,bash]
  This is the default when creating OSDs since Ceph Luminous.
  
  [source,bash]
@@ -375,7 +385,7 @@ pveceph osd create /dev/sd[X] -db_dev /dev/sd[Y] -wal_dev /dev/sd[Z]
  ----
  
  You can directly choose the size for those with the '-db_size' and '-wal_size'
  ----
  
  You can directly choose the size for those with the '-db_size' and '-wal_size'
-paremeters respectively. If they are not given the following values (in order)
+parameters respectively. If they are not given the following values (in order)
  will be used:
  
  * bluestore_block_{db,wal}_size from ceph configuration...
  will be used:
  
  * bluestore_block_{db,wal}_size from ceph configuration...
@@ -452,8 +462,9 @@ NOTE: The default number of PGs works for 2-5 disks. Ceph throws a
  
  It is advised to calculate the PG number depending on your setup, you can find
  the formula and the PG calculator footnote:[PG calculator
  
  It is advised to calculate the PG number depending on your setup, you can find
  the formula and the PG calculator footnote:[PG calculator
-http://ceph.com/pgcalc/] online. While PGs can be increased later on, they can
-never be decreased.
+https://ceph.com/pgcalc/] online. From Ceph Nautilus onwards it is possible to
+increase and decrease the number of PGs later on footnote:[Placement Groups
+https://docs.ceph.com/docs/{ceph_codename}/rados/operations/placement-groups/].
  
  
  You can create pools through command line or on the GUI on each PVE host under
  
  
  You can create pools through command line or on the GUI on each PVE host under
@@ -470,7 +481,7 @@ mark the checkbox "Add storages" in the GUI or use the command line option
  
  Further information on Ceph pool handling can be found in the Ceph pool
  operation footnote:[Ceph pool operation
  
  Further information on Ceph pool handling can be found in the Ceph pool
  operation footnote:[Ceph pool operation
-http://docs.ceph.com/docs/luminous/rados/operations/pools/]
+https://docs.ceph.com/docs/{ceph_codename}/rados/operations/pools/]
  manual.
  
  
  manual.
  
  
@@ -503,7 +514,7 @@ advantage that no central index service is needed. CRUSH works with a map of
  OSDs, buckets (device locations) and rulesets (data replication) for pools.
  
  NOTE: Further information can be found in the Ceph documentation, under the
  OSDs, buckets (device locations) and rulesets (data replication) for pools.
  
  NOTE: Further information can be found in the Ceph documentation, under the
-section CRUSH map footnote:[CRUSH map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/].
+section CRUSH map footnote:[CRUSH map https://docs.ceph.com/docs/{ceph_codename}/rados/operations/crush-map/].
  
  This map can be altered to reflect different replication hierarchies. The object
  replicas can be separated (eg. failure domains), while maintaining the desired
  
  This map can be altered to reflect different replication hierarchies. The object
  replicas can be separated (eg. failure domains), while maintaining the desired
@@ -649,7 +660,7 @@ Since Luminous (12.2.x) you can also have multiple active metadata servers
  running, but this is normally only useful for a high count on parallel clients,
  as else the `MDS` seldom is the bottleneck. If you want to set this up please
  refer to the ceph documentation. footnote:[Configuring multiple active MDS
  running, but this is normally only useful for a high count on parallel clients,
  as else the `MDS` seldom is the bottleneck. If you want to set this up please
  refer to the ceph documentation. footnote:[Configuring multiple active MDS
-daemons http://docs.ceph.com/docs/luminous/cephfs/multimds/]
+daemons https://docs.ceph.com/docs/{ceph_codename}/cephfs/multimds/]
  
  [[pveceph_fs_create]]
  Create CephFS
  
  [[pveceph_fs_create]]
  Create CephFS
@@ -681,7 +692,7 @@ This creates a CephFS named `'cephfs'' using a pool for its data named
  Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
  Ceph documentation for more information regarding a fitting placement group
  number (`pg_num`) for your setup footnote:[Ceph Placement Groups
  Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
  Ceph documentation for more information regarding a fitting placement group
  number (`pg_num`) for your setup footnote:[Ceph Placement Groups
-http://docs.ceph.com/docs/luminous/rados/operations/placement-groups/].
+https://docs.ceph.com/docs/{ceph_codename}/rados/operations/placement-groups/].
  Additionally, the `'--add-storage'' parameter will add the CephFS to the {pve}
  storage configuration after it was created successfully.
  
  Additionally, the `'--add-storage'' parameter will add the CephFS to the {pve}
  storage configuration after it was created successfully.
  
@@ -716,12 +727,20 @@ pveceph pool destroy NAME
  
  Ceph maintenance
  ----------------
  
  Ceph maintenance
  ----------------
+
  Replace OSDs
  ~~~~~~~~~~~~
  Replace OSDs
  ~~~~~~~~~~~~
+
  One of the common maintenance tasks in Ceph is to replace a disk of an OSD. If
  a disk is already in a failed state, then you can go ahead and run through the
  steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate those
  One of the common maintenance tasks in Ceph is to replace a disk of an OSD. If
  a disk is already in a failed state, then you can go ahead and run through the
  steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate those
-copies on the remaining OSDs if possible.
+copies on the remaining OSDs if possible. This rebalancing will start as soon
+as an OSD failure is detected or an OSD was actively stopped.
+
+NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
+`size + 1` nodes are available. The reason for this is that the Ceph object
+balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as
+`failure domain'.
  
  To replace a still functioning disk, on the GUI go through the steps in
  xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
  
  To replace a still functioning disk, on the GUI go through the steps in
  xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
@@ -747,23 +766,23 @@ pveceph osd destroy <id>
  Replace the old disk with the new one and use the same procedure as described
  in xref:pve_ceph_osd_create[Create OSDs].
  
  Replace the old disk with the new one and use the same procedure as described
  in xref:pve_ceph_osd_create[Create OSDs].
  
-NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
-`size + 1` nodes are available.
-
-Run fstrim (discard)
-~~~~~~~~~~~~~~~~~~~~
+Trim/Discard
+~~~~~~~~~~~~
  It is a good measure to run 'fstrim' (discard) regularly on VMs or containers.
  This releases data blocks that the filesystem isn’t using anymore. It reduces
  It is a good measure to run 'fstrim' (discard) regularly on VMs or containers.
  This releases data blocks that the filesystem isn’t using anymore. It reduces
-data usage and the resource load.
+data usage and resource load. Most modern operating systems issue such discard
+commands to their disks regularly. You only need to ensure that the Virtual
+Machines enable the xref:qm_hard_disk_discard[disk discard option].
  
  
+[[pveceph_scrub]]
  Scrub & Deep Scrub
  ~~~~~~~~~~~~~~~~~~
  Ceph ensures data integrity by 'scrubbing' placement groups. Ceph checks every
  object in a PG for its health. There are two forms of Scrubbing, daily
  Scrub & Deep Scrub
  ~~~~~~~~~~~~~~~~~~
  Ceph ensures data integrity by 'scrubbing' placement groups. Ceph checks every
  object in a PG for its health. There are two forms of Scrubbing, daily
-(metadata compare) and weekly. The weekly reads the objects and uses checksums
-to ensure data integrity. If a running scrub interferes with business needs,
-you can adjust the time when scrubs footnote:[Ceph scrubbing
-https://docs.ceph.com/docs/nautilus/rados/configuration/osd-config-ref/#scrubbing]
+cheap metadata checks and weekly deep data checks. The weekly deep scrub reads
+the objects and uses checksums to ensure data integrity. If a running scrub
+interferes with business (performance) needs, you can adjust the time when
+scrubs footnote:[Ceph scrubbing https://docs.ceph.com/docs/{ceph_codename}/rados/configuration/osd-config-ref/#scrubbing]
  are executed.
  
  
  are executed.
  
  
@@ -787,10 +806,10 @@ pve# ceph -w
  
  To get a more detailed view, every ceph service has a log file under
  `/var/log/ceph/` and if there is not enough detail, the log level can be
  
  To get a more detailed view, every ceph service has a log file under
  `/var/log/ceph/` and if there is not enough detail, the log level can be
-adjusted footnote:[Ceph log and debugging http://docs.ceph.com/docs/luminous/rados/troubleshooting/log-and-debug/].
+adjusted footnote:[Ceph log and debugging https://docs.ceph.com/docs/{ceph_codename}/rados/troubleshooting/log-and-debug/].
  
  You can find more information about troubleshooting
  
  You can find more information about troubleshooting
-footnote:[Ceph troubleshooting http://docs.ceph.com/docs/luminous/rados/troubleshooting/]
+footnote:[Ceph troubleshooting https://docs.ceph.com/docs/{ceph_codename}/rados/troubleshooting/]
  a Ceph cluster on the official website.
  
  
  a Ceph cluster on the official website.