X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=pveceph.adoc;h=f050b1b8de3075157c4171ab980af9d74dbb76f5;hb=94fd8ea59c669d0cd113fda0429543c395921cb1;hp=67a0dba248a0fcfcb2240dc72e95377e9d7ff1c6;hpb=07fef357a9f83feb8be6c5f5f067cedfdb87cf6f;p=pve-docs.git diff --git a/pveceph.adoc b/pveceph.adoc index 67a0dba..f050b1b 100644 --- a/pveceph.adoc +++ b/pveceph.adoc @@ -221,7 +221,7 @@ If you want to use a dedicated SSD journal disk: [source,bash] ---- -pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] +pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] -bluestore 0 ---- Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD @@ -229,7 +229,7 @@ journal disk. [source,bash] ---- -pveceph createosd /dev/sdf -journal_dev /dev/sdb +pveceph createosd /dev/sdf -journal_dev /dev/sdb -bluestore 0 ---- This partitions the disk (data and journal partition), creates @@ -284,6 +284,85 @@ operation footnote:[Ceph pool operation http://docs.ceph.com/docs/luminous/rados/operations/pools/] manual. +Ceph CRUSH & device classes +--------------------------- +The foundation of Ceph is its algorithm, **C**ontrolled **R**eplication +**U**nder **S**calable **H**ashing +(CRUSH footnote:[CRUSH https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf]). + +CRUSH calculates where to store to and retrieve data from, this has the +advantage that no central index service is needed. CRUSH works with a map of +OSDs, buckets (device locations) and rulesets (data replication) for pools. + +NOTE: Further information can be found in the Ceph documentation, under the +section CRUSH map footnote:[CRUSH map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/]. + +This map can be altered to reflect different replication hierarchies. The object +replicas can be separated (eg. failure domains), while maintaining the desired +distribution. + +A common use case is to use different classes of disks for different Ceph pools. +For this reason, Ceph introduced the device classes with luminous, to +accommodate the need for easy ruleset generation. + +The device classes can be seen in the 'ceph osd tree' output. These classes +represent their own root bucket, which can be seen with the below command. + +[source, bash] +---- +ceph osd crush tree --show-shadow +---- + +Example output form the above command: + +[source, bash] +---- +ID CLASS WEIGHT TYPE NAME +-16 nvme 2.18307 root default~nvme +-13 nvme 0.72769 host sumi1~nvme + 12 nvme 0.72769 osd.12 +-14 nvme 0.72769 host sumi2~nvme + 13 nvme 0.72769 osd.13 +-15 nvme 0.72769 host sumi3~nvme + 14 nvme 0.72769 osd.14 + -1 7.70544 root default + -3 2.56848 host sumi1 + 12 nvme 0.72769 osd.12 + -5 2.56848 host sumi2 + 13 nvme 0.72769 osd.13 + -7 2.56848 host sumi3 + 14 nvme 0.72769 osd.14 +---- + +To let a pool distribute its objects only on a specific device class, you need +to create a ruleset with the specific class first. + +[source, bash] +---- +ceph osd crush rule create-replicated +---- + +[frame="none",grid="none", align="left", cols="30%,70%"] +|=== +||name of the rule, to connect with a pool (seen in GUI & CLI) +||which crush root it should belong to (default ceph root "default") +||at which failure-domain the objects should be distributed (usually host) +||what type of OSD backing store to use (eg. nvme, ssd, hdd) +|=== + +Once the rule is in the CRUSH map, you can tell a pool to use the ruleset. + +[source, bash] +---- +ceph osd pool set crush_rule +---- + +TIP: If the pool already contains objects, all of these have to be moved +accordingly. Depending on your setup this may introduce a big performance hit on +your cluster. As an alternative, you can create a new pool and move disks +separately. + + Ceph Client -----------