In general SSDs will provide more IOPs than spinning disks. This fact and the
higher cost may make a xref:pve_ceph_device_classes[class based] separation of
pools appealing. Another possibility to speedup OSDs is to use a faster disk
-as journal or DB/WAL device, see xref:pve_ceph_osds[creating Ceph OSDs]. If a
-faster disk is used for multiple OSDs, a proper balance between OSD and WAL /
-DB (or journal) disk must be selected, otherwise the faster disk becomes the
-bottleneck for all linked OSDs.
+as journal or DB/**W**rite-**A**head-**L**og device, see
+xref:pve_ceph_osds[creating Ceph OSDs]. If a faster disk is used for multiple
+OSDs, a proper balance between OSD and WAL / DB (or journal) disk must be
+selected, otherwise the faster disk becomes the bottleneck for all linked OSDs.
Aside from the disk type, Ceph best performs with an even sized and distributed
amount of disks per node. For example, 4 x 500 GB disks with in each node is
footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
maintains a master copy of the cluster map. For high availability you need to
have at least 3 monitors. One monitor will already be installed if you
-used the installation wizard. You wont need more than 3 monitors as long
+used the installation wizard. You won't need more than 3 monitors as long
as your cluster is small to midsize, only really large clusters will
need more than that.
among your, at least three nodes (4 OSDs on each node).
If the disk was used before (eg. ZFS/RAID/OSD), to remove partition table, boot
-sector and any OSD leftover the following commands should be sufficient.
+sector and any OSD leftover the following command should be sufficient.
[source,bash]
----
-dd if=/dev/zero of=/dev/sd[X] bs=1M count=200
-ceph-disk zap /dev/sd[X]
+ceph-volume lvm zap /dev/sd[X] --destroy
----
-WARNING: The above commands will destroy data on the disk!
+WARNING: The above command will destroy data on the disk!
Ceph Bluestore
~~~~~~~~~~~~~~
Starting with the Ceph Kraken release, a new Ceph OSD storage type was
introduced, the so called Bluestore
footnote:[Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/].
-This is the default when creating OSDs in Ceph luminous.
+This is the default when creating OSDs since Ceph Luminous.
[source,bash]
----
pveceph createosd /dev/sd[X]
----
-NOTE: In order to select a disk in the GUI, to be more fail-safe, the disk needs
-to have a GPT footnoteref:[GPT, GPT partition table
-https://en.wikipedia.org/wiki/GUID_Partition_Table] partition table. You can
-create this with `gdisk /dev/sd(x)`. If there is no GPT, you cannot select the
-disk as DB/WAL.
+.Block.db and block.wal
If you want to use a separate DB/WAL device for your OSDs, you can specify it
-through the '-journal_dev' option. The WAL is placed with the DB, if not
+through the '-db_dev' and '-wal_dev' options. The WAL is placed with the DB, if not
specified separately.
[source,bash]
----
-pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y]
+pveceph createosd /dev/sd[X] -db_dev /dev/sd[Y] -wal_dev /dev/sd[Z]
----
+You can directly choose the size for those with the '-db_size' and '-wal_size'
+paremeters respectively. If they are not given the following values (in order)
+will be used:
+
+* bluestore_block_{db,wal}_size from ceph configuration...
+** ... database, section 'osd'
+** ... database, section 'global'
+** ... file, section 'osd'
+** ... file, section 'global'
+* 10% (DB)/1% (WAL) of OSD size
+
NOTE: The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s
internal journal or write-ahead log. It is recommended to use a fast SSD or
NVRAM for better performance.
Ceph Filestore
-~~~~~~~~~~~~~
-Till Ceph luminous, Filestore was used as storage type for Ceph OSDs. It can
-still be used and might give better performance in small setups, when backed by
-an NVMe SSD or similar.
-
-[source,bash]
-----
-pveceph createosd /dev/sd[X] -bluestore 0
-----
-
-NOTE: In order to select a disk in the GUI, the disk needs to have a
-GPT footnoteref:[GPT] partition table. You can
-create this with `gdisk /dev/sd(x)`. If there is no GPT, you cannot select the
-disk as journal. Currently the journal size is fixed to 5 GB.
-
-If you want to use a dedicated SSD journal disk:
-
-[source,bash]
-----
-pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] -bluestore 0
-----
+~~~~~~~~~~~~~~
-Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD
-journal disk.
+Before Ceph Luminous, Filestore was used as default storage type for Ceph OSDs.
+Starting with Ceph Nautilus, {pve} does not support creating such OSDs with
+'pveceph' anymore. If you still want to create filestore OSDs, use
+'ceph-volume' directly.
[source,bash]
----
-pveceph createosd /dev/sdf -journal_dev /dev/sdb -bluestore 0
+ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y]
----
-This partitions the disk (data and journal partition), creates
-filesystems and starts the OSD, afterwards it is running and fully
-functional.
-
-NOTE: This command refuses to initialize disk when it detects existing data. So
-if you want to overwrite a disk you should remove existing data first. You can
-do that using: 'ceph-disk zap /dev/sd[X]'
-
-You can create OSDs containing both journal and data partitions or you
-can place the journal on a dedicated SSD. Using a SSD journal disk is
-highly recommended to achieve good performance.
-
-
[[pve_ceph_pools]]
Creating Ceph Pools
-------------------
pveceph createpool <name>
----
-If you would like to automatically get also a storage definition for your pool,
-active the checkbox "Add storages" on the GUI or use the command line option
-'--add_storages' on pool creation.
+If you would like to automatically also get a storage definition for your pool,
+mark the checkbox "Add storages" in the GUI or use the command line option
+'--add_storages' at pool creation.
Further information on Ceph pool handling can be found in the Ceph pool
operation footnote:[Ceph pool operation
Container images. Simply use the GUI too add a new `RBD` storage (see
section xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
-You also need to copy the keyring to a predefined location for a external Ceph
+You also need to copy the keyring to a predefined location for an external Ceph
cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
done automatically.
an issue with traditional shared filesystem approaches, like `NFS`, for
example.
+[thumbnail="screenshot/gui-node-ceph-cephfs-panel.png"]
+
{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage]
to save backups, ISO files or container templates and creating a
hyper-converged CephFS itself.
`warm` state. But naturally, the active polling will cause some additional
performance impact on your system and active `MDS`.
-Multiple Active MDS
-^^^^^^^^^^^^^^^^^^^
+.Multiple Active MDS
Since Luminous (12.2.x) you can also have multiple active metadata servers
running, but this is normally only useful for a high count on parallel clients,
undone!
If you really want to destroy an existing CephFS you first need to stop, or
-destroy, all metadata server (`M̀DS`). You can destroy them either over the Web
+destroy, all metadata servers (`M̀DS`). You can destroy them either over the Web
GUI or the command line interface, with:
----
The following ceph commands below can be used to see if the cluster is healthy
('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
('HEALTH_ERR'). If the cluster is in an unhealthy state the status commands
-below will also give you an overview on the current events and actions take.
+below will also give you an overview of the current events and actions to take.
----
# single time output
You can find more information about troubleshooting
footnote:[Ceph troubleshooting http://docs.ceph.com/docs/luminous/rados/troubleshooting/]
-a Ceph cluster on its website.
+a Ceph cluster on the official website.
ifdef::manvolnum[]