[pve-docs.git] / pveceph.adoc

[[chapter_pveceph]]
ifdef::manvolnum[]
pveceph(1)
==========
:pve-toplevel:

NAME
----

pveceph - Manage Ceph Services on Proxmox VE Nodes

SYNOPSIS
--------

include::pveceph.1-synopsis.adoc[]

DESCRIPTION
-----------
endif::manvolnum[]
ifndef::manvolnum[]
Manage Ceph Services on Proxmox VE Nodes
========================================
:pve-toplevel:
endif::manvolnum[]

[thumbnail="gui-ceph-status.png"]

{pve} unifies your compute and storage systems, i.e. you can use the
same physical nodes within a cluster for both computing (processing
VMs and containers) and replicated storage. The traditional silos of
compute and storage resources can be wrapped up into a single
hyper-converged appliance. Separate storage networks (SANs) and
connections via network (NAS) disappear. With the integration of Ceph,
an open source software-defined storage platform, {pve} has the
ability to run and manage Ceph storage directly on the hypervisor
nodes.

Ceph is a distributed object store and file system designed to provide
excellent performance, reliability and scalability.

For small to mid sized deployments, it is possible to install a Ceph server for
RADOS Block Devices (RBD) directly on your {pve} cluster nodes, see
xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]. Recent
hardware has plenty of CPU power and RAM, so running storage services
and VMs on the same node is possible.

To simplify management, we provide 'pveceph' - a tool to install and
manage {ceph} services on {pve} nodes.

Ceph consists of a couple of Daemons
footnote:[Ceph intro http://docs.ceph.com/docs/master/start/intro/], for use as
a RBD storage:

- Ceph Monitor (ceph-mon)
- Ceph Manager (ceph-mgr)
- Ceph OSD (ceph-osd; Object Storage Daemon)

TIP: We recommend to get familiar with the Ceph vocabulary.
footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary]


Precondition
------------

To build a Proxmox Ceph Cluster there should be at least three (preferably)
identical servers for the setup.

A 10Gb network, exclusively used for Ceph, is recommended. A meshed
network setup is also an option if there are no 10Gb switches
available, see {webwiki-url}Full_Mesh_Network_for_Ceph_Server[wiki] .

Check also the recommendations from
http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website].


Installation of Ceph Packages
-----------------------------

On each node run the installation script as follows:

[source,bash]
----
pveceph install
----

This sets up an `apt` package repository in
`/etc/apt/sources.list.d/ceph.list` and installs the required software.


Creating initial Ceph configuration
-----------------------------------

[thumbnail="gui-ceph-config.png"]

After installation of packages, you need to create an initial Ceph
configuration on just one node, based on your network (`10.10.10.0/24`
in the following example) dedicated for Ceph:

[source,bash]
----
pveceph init --network 10.10.10.0/24
----

This creates an initial config at `/etc/pve/ceph.conf`. That file is
automatically distributed to all {pve} nodes by using
xref:chapter_pmxcfs[pmxcfs]. The command also creates a symbolic link
from `/etc/ceph/ceph.conf` pointing to that file. So you can simply run
Ceph commands without the need to specify a configuration file.


[[pve_ceph_monitors]]
Creating Ceph Monitors
----------------------

[thumbnail="gui-ceph-monitor.png"]

The Ceph Monitor (MON)
footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
maintains a master copy of the cluster map. For HA you need to have at least 3
monitors.

On each node where you want to place a monitor (three monitors are recommended),
create it by using the 'Ceph -> Monitor' tab in the GUI or run.


[source,bash]
----
pveceph createmon
----

This will also install the needed Ceph Manager ('ceph-mgr') by default. If you
do not want to install a manager, specify the '-exclude-manager' option.


[[pve_ceph_manager]]
Creating Ceph Manager
----------------------

The Manager daemon runs alongside the monitors. It provides interfaces for
monitoring the cluster. Since the Ceph luminous release the
ceph-mgr footnote:[Ceph Manager http://docs.ceph.com/docs/luminous/mgr/] daemon
is required. During monitor installation the ceph manager will be installed as
well.

NOTE: It is recommended to install the Ceph Manager on the monitor nodes. For
high availability install more then one manager.

[source,bash]
----
pveceph createmgr
----


[[pve_ceph_osds]]
Creating Ceph OSDs
------------------

[thumbnail="gui-ceph-osd-status.png"]

via GUI or via CLI as follows:

[source,bash]
----
pveceph createosd /dev/sd[X]
----

TIP: We recommend a Ceph cluster size, starting with 12 OSDs, distributed evenly
among your, at least three nodes (4 OSDs on each node).


Ceph Bluestore
~~~~~~~~~~~~~~

Starting with the Ceph Kraken release, a new Ceph OSD storage type was
introduced, the so called Bluestore
footnote:[Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/]. In
Ceph luminous this store is the default when creating OSDs.

[source,bash]
----
pveceph createosd /dev/sd[X]
----

NOTE: In order to select a disk in the GUI, to be more failsafe, the disk needs
to have a
GPT footnoteref:[GPT,
GPT partition table https://en.wikipedia.org/wiki/GUID_Partition_Table]
partition table. You can create this with `gdisk /dev/sd(x)`. If there is no
GPT, you cannot select the disk as DB/WAL.

If you want to use a separate DB/WAL device for your OSDs, you can specify it
through the '-wal_dev' option.

[source,bash]
----
pveceph createosd /dev/sd[X] -wal_dev /dev/sd[Y]
----

NOTE: The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s
internal journal or write-ahead log. It is recommended to use a fast SSDs or
NVRAM for better performance.


Ceph Filestore
~~~~~~~~~~~~~
Till Ceph luminous, Filestore was used as storage type for Ceph OSDs. It can
still be used and might give better performance in small setups, when backed by
a NVMe SSD or similar.

[source,bash]
----
pveceph createosd /dev/sd[X] -bluestore 0
----

NOTE: In order to select a disk in the GUI, the disk needs to have a
GPT footnoteref:[GPT] partition table. You can
create this with `gdisk /dev/sd(x)`. If there is no GPT, you cannot select the
disk as journal. Currently the journal size is fixed to 5 GB.

If you want to use a dedicated SSD journal disk:

[source,bash]
----
pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] -bluestore 0
----

Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD
journal disk.

[source,bash]
----
pveceph createosd /dev/sdf -journal_dev /dev/sdb -bluestore 0
----

This partitions the disk (data and journal partition), creates
filesystems and starts the OSD, afterwards it is running and fully
functional.

NOTE: This command refuses to initialize disk when it detects existing data. So
if you want to overwrite a disk you should remove existing data first. You can
do that using: 'ceph-disk zap /dev/sd[X]'

You can create OSDs containing both journal and data partitions or you
can place the journal on a dedicated SSD. Using a SSD journal disk is
highly recommended to achieve good performance.


[[pve_ceph_pools]]
Creating Ceph Pools
-------------------

[thumbnail="gui-ceph-pools.png"]

A pool is a logical group for storing objects. It holds **P**lacement
**G**roups (PG), a collection of objects.

When no options are given, we set a
default of **64 PGs**, a **size of 3 replicas** and a **min_size of 2 replicas**
for serving objects in a degraded state.

NOTE: The default number of PGs works for 2-6 disks. Ceph throws a
"HEALTH_WARNING" if you have too few or too many PGs in your cluster.

It is advised to calculate the PG number depending on your setup, you can find
the formula and the PG
calculator footnote:[PG calculator http://ceph.com/pgcalc/] online. While PGs
can be increased later on, they can never be decreased.


You can create pools through command line or on the GUI on each PVE host under
**Ceph -> Pools**.

[source,bash]
----
pveceph createpool <name>
----

If you would like to automatically get also a storage definition for your pool,
active the checkbox "Add storages" on the GUI or use the command line option
'--add_storages' on pool creation.

Further information on Ceph pool handling can be found in the Ceph pool
operation footnote:[Ceph pool operation
http://docs.ceph.com/docs/luminous/rados/operations/pools/]
manual.

Ceph CRUSH & device classes
---------------------------
The foundation of Ceph is its algorithm, **C**ontrolled **R**eplication
**U**nder **S**calable **H**ashing
(CRUSH footnote:[CRUSH https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf]).

CRUSH calculates where to store to and retrieve data from, this has the
advantage that no central index service is needed. CRUSH works with a map of
OSDs, buckets (device locations) and rulesets (data replication) for pools.

NOTE: Further information can be found in the Ceph documentation, under the
section CRUSH map footnote:[CRUSH map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/].

This map can be altered to reflect different replication hierarchies. The object
replicas can be separated (eg. failure domains), while maintaining the desired
distribution.

A common use case is to use different classes of disks for different Ceph pools.
For this reason, Ceph introduced the device classes with luminous, to
accommodate the need for easy ruleset generation.

The device classes can be seen in the 'ceph osd tree' output. These classes
represent their own root bucket, which can be seen with the below command.

[source, bash]
----
ceph osd crush tree --show-shadow
----

Example output form the above command:

[source, bash]
----
ID  CLASS WEIGHT  TYPE NAME
-16  nvme 2.18307 root default~nvme
-13  nvme 0.72769     host sumi1~nvme
 12  nvme 0.72769         osd.12
-14  nvme 0.72769     host sumi2~nvme
 13  nvme 0.72769         osd.13
-15  nvme 0.72769     host sumi3~nvme
 14  nvme 0.72769         osd.14
 -1       7.70544 root default
 -3       2.56848     host sumi1
 12  nvme 0.72769         osd.12
 -5       2.56848     host sumi2
 13  nvme 0.72769         osd.13
 -7       2.56848     host sumi3
 14  nvme 0.72769         osd.14
----

To let a pool distribute its objects only on a specific device class, you need
to create a ruleset with the specific class first.

[source, bash]
----
ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
----

[frame="none",grid="none", align="left", cols="30%,70%"]
|===
|<rule-name>|name of the rule, to connect with a pool (seen in GUI & CLI)
|<root>|which crush root it should belong to (default ceph root "default")
|<failure-domain>|at which failure-domain the objects should be distributed (usually host)
|<class>|what type of OSD backing store to use (eg. nvme, ssd, hdd)
|===

Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.

[source, bash]
----
ceph osd pool set <pool-name> crush_rule <rule-name>
----

TIP: If the pool already contains objects, all of these have to be moved
accordingly. Depending on your setup this may introduce a big performance hit on
your cluster. As an alternative, you can create a new pool and move disks
separately.


Ceph Client
-----------

[thumbnail="gui-ceph-log.png"]

You can then configure {pve} to use such pools to store VM or
Container images. Simply use the GUI too add a new `RBD` storage (see
section xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).

You also need to copy the keyring to a predefined location for a external Ceph
cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
done automatically.

NOTE: The file name needs to be `<storage_id> + `.keyring` - `<storage_id>` is
the expression after 'rbd:' in `/etc/pve/storage.cfg` which is
`my-ceph-storage` in the following example:

[source,bash]
----
mkdir /etc/pve/priv/ceph
cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring
----


ifdef::manvolnum[]
include::pve-copyright.adoc[]
endif::manvolnum[]
Commit	Line	Data
80c0adcb	1	[[chapter_pveceph]]
0840a663	2	ifdef::manvolnum[]
b2f242ab DM	3	pveceph(1)
b2f242ab DM	4	==========
404a158e	5	:pve-toplevel:
0840a663 DM	6
	7	NAME
	8	----
	9
21394e70	10	pveceph - Manage Ceph Services on Proxmox VE Nodes
0840a663	11
49a5e11c	12	SYNOPSIS
0840a663 DM	13	--------
	14
	15	include::pveceph.1-synopsis.adoc[]
	16
	17	DESCRIPTION
	18	-----------
	19	endif::manvolnum[]
0840a663	20	ifndef::manvolnum[]
fe93f133 DM	21	Manage Ceph Services on Proxmox VE Nodes
fe93f133 DM	22	========================================
49d3ad91	23	:pve-toplevel:
0840a663 DM	24	endif::manvolnum[]
0840a663 DM	25
8997dd6e DM	26	[thumbnail="gui-ceph-status.png"]
8997dd6e DM	27
c994e4e5 DM	28	{pve} unifies your compute and storage systems, i.e. you can use the
	29	same physical nodes within a cluster for both computing (processing
	30	VMs and containers) and replicated storage. The traditional silos of
	31	compute and storage resources can be wrapped up into a single
	32	hyper-converged appliance. Separate storage networks (SANs) and
	33	connections via network (NAS) disappear. With the integration of Ceph,
	34	an open source software-defined storage platform, {pve} has the
	35	ability to run and manage Ceph storage directly on the hypervisor
	36	nodes.
	37
	38	Ceph is a distributed object store and file system designed to provide
1d54c3b4 AA	39	excellent performance, reliability and scalability.
	40
	41	For small to mid sized deployments, it is possible to install a Ceph server for
	42	RADOS Block Devices (RBD) directly on your {pve} cluster nodes, see
c994e4e5 DM	43	xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]. Recent
	44	hardware has plenty of CPU power and RAM, so running storage services
	45	and VMs on the same node is possible.
21394e70 DM	46
	47	To simplify management, we provide 'pveceph' - a tool to install and
	48	manage {ceph} services on {pve} nodes.
	49
1d54c3b4 AA	50	Ceph consists of a couple of Daemons
	51	footnote:[Ceph intro http://docs.ceph.com/docs/master/start/intro/], for use as
	52	a RBD storage:
	53
	54	- Ceph Monitor (ceph-mon)
	55	- Ceph Manager (ceph-mgr)
	56	- Ceph OSD (ceph-osd; Object Storage Daemon)
	57
	58	TIP: We recommend to get familiar with the Ceph vocabulary.
	59	footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary]
	60
21394e70 DM	61
	62	Precondition
	63	------------
	64
c994e4e5 DM	65	To build a Proxmox Ceph Cluster there should be at least three (preferably)
c994e4e5 DM	66	identical servers for the setup.
21394e70	67
470d4313	68	A 10Gb network, exclusively used for Ceph, is recommended. A meshed
c994e4e5 DM	69	network setup is also an option if there are no 10Gb switches
c994e4e5 DM	70	available, see {webwiki-url}Full_Mesh_Network_for_Ceph_Server[wiki] .
21394e70 DM	71
21394e70 DM	72	Check also the recommendations from
1d54c3b4	73	http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website].
21394e70 DM	74
	75
	76	Installation of Ceph Packages
	77	-----------------------------
	78
	79	On each node run the installation script as follows:
	80
	81	[source,bash]
	82	----
19920184	83	pveceph install
21394e70 DM	84	----
	85
	86	This sets up an `apt` package repository in
	87	`/etc/apt/sources.list.d/ceph.list` and installs the required software.
	88
	89
	90	Creating initial Ceph configuration
	91	-----------------------------------
	92
8997dd6e DM	93	[thumbnail="gui-ceph-config.png"]
8997dd6e DM	94
21394e70 DM	95	After installation of packages, you need to create an initial Ceph
	96	configuration on just one node, based on your network (`10.10.10.0/24`
	97	in the following example) dedicated for Ceph:
	98
	99	[source,bash]
	100	----
	101	pveceph init --network 10.10.10.0/24
	102	----
	103
	104	This creates an initial config at `/etc/pve/ceph.conf`. That file is
c994e4e5	105	automatically distributed to all {pve} nodes by using
21394e70 DM	106	xref:chapter_pmxcfs[pmxcfs]. The command also creates a symbolic link
	107	from `/etc/ceph/ceph.conf` pointing to that file. So you can simply run
	108	Ceph commands without the need to specify a configuration file.
	109
	110
d9a27ee1	111	[[pve_ceph_monitors]]
21394e70 DM	112	Creating Ceph Monitors
	113	----------------------
	114
8997dd6e DM	115	[thumbnail="gui-ceph-monitor.png"]
8997dd6e DM	116
1d54c3b4 AA	117	The Ceph Monitor (MON)
	118	footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
	119	maintains a master copy of the cluster map. For HA you need to have at least 3
	120	monitors.
	121
	122	On each node where you want to place a monitor (three monitors are recommended),
	123	create it by using the 'Ceph -> Monitor' tab in the GUI or run.
21394e70 DM	124
	125
	126	[source,bash]
	127	----
	128	pveceph createmon
	129	----
	130
1d54c3b4 AA	131	This will also install the needed Ceph Manager ('ceph-mgr') by default. If you
	132	do not want to install a manager, specify the '-exclude-manager' option.
	133
	134
	135	[[pve_ceph_manager]]
	136	Creating Ceph Manager
	137	----------------------
	138
	139	The Manager daemon runs alongside the monitors. It provides interfaces for
	140	monitoring the cluster. Since the Ceph luminous release the
	141	ceph-mgr footnote:[Ceph Manager http://docs.ceph.com/docs/luminous/mgr/] daemon
	142	is required. During monitor installation the ceph manager will be installed as
	143	well.
	144
	145	NOTE: It is recommended to install the Ceph Manager on the monitor nodes. For
	146	high availability install more then one manager.
	147
	148	[source,bash]
	149	----
	150	pveceph createmgr
	151	----
	152
21394e70	153
d9a27ee1	154	[[pve_ceph_osds]]
21394e70 DM	155	Creating Ceph OSDs
	156	------------------
	157
8997dd6e DM	158	[thumbnail="gui-ceph-osd-status.png"]
8997dd6e DM	159
21394e70 DM	160	via GUI or via CLI as follows:
	161
	162	[source,bash]
	163	----
	164	pveceph createosd /dev/sd[X]
	165	----
	166
1d54c3b4 AA	167	TIP: We recommend a Ceph cluster size, starting with 12 OSDs, distributed evenly
	168	among your, at least three nodes (4 OSDs on each node).
	169
	170
	171	Ceph Bluestore
	172	~~~~~~~~~~~~~~
21394e70	173
1d54c3b4 AA	174	Starting with the Ceph Kraken release, a new Ceph OSD storage type was
	175	introduced, the so called Bluestore
	176	footnote:[Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/]. In
	177	Ceph luminous this store is the default when creating OSDs.
21394e70 DM	178
	179	[source,bash]
	180	----
1d54c3b4 AA	181	pveceph createosd /dev/sd[X]
	182	----
	183
	184	NOTE: In order to select a disk in the GUI, to be more failsafe, the disk needs
	185	to have a
	186	GPT footnoteref:[GPT,
	187	GPT partition table https://en.wikipedia.org/wiki/GUID_Partition_Table]
	188	partition table. You can create this with `gdisk /dev/sd(x)`. If there is no
	189	GPT, you cannot select the disk as DB/WAL.
	190
	191	If you want to use a separate DB/WAL device for your OSDs, you can specify it
	192	through the '-wal_dev' option.
	193
	194	[source,bash]
	195	----
	196	pveceph createosd /dev/sd[X] -wal_dev /dev/sd[Y]
	197	----
	198
	199	NOTE: The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s
	200	internal journal or write-ahead log. It is recommended to use a fast SSDs or
	201	NVRAM for better performance.
	202
	203
	204	Ceph Filestore
	205	~~~~~~~~~~~~~
	206	Till Ceph luminous, Filestore was used as storage type for Ceph OSDs. It can
	207	still be used and might give better performance in small setups, when backed by
	208	a NVMe SSD or similar.
	209
	210	[source,bash]
	211	----
	212	pveceph createosd /dev/sd[X] -bluestore 0
	213	----
	214
	215	NOTE: In order to select a disk in the GUI, the disk needs to have a
	216	GPT footnoteref:[GPT] partition table. You can
	217	create this with `gdisk /dev/sd(x)`. If there is no GPT, you cannot select the
	218	disk as journal. Currently the journal size is fixed to 5 GB.
	219
	220	If you want to use a dedicated SSD journal disk:
	221
	222	[source,bash]
	223	----
e677b344	224	pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] -bluestore 0
21394e70 DM	225	----
	226
	227	Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD
	228	journal disk.
	229
	230	[source,bash]
	231	----
e677b344	232	pveceph createosd /dev/sdf -journal_dev /dev/sdb -bluestore 0
21394e70 DM	233	----
	234
	235	This partitions the disk (data and journal partition), creates
	236	filesystems and starts the OSD, afterwards it is running and fully
1d54c3b4	237	functional.
21394e70	238
1d54c3b4 AA	239	NOTE: This command refuses to initialize disk when it detects existing data. So
	240	if you want to overwrite a disk you should remove existing data first. You can
	241	do that using: 'ceph-disk zap /dev/sd[X]'
21394e70 DM	242
	243	You can create OSDs containing both journal and data partitions or you
	244	can place the journal on a dedicated SSD. Using a SSD journal disk is
1d54c3b4	245	highly recommended to achieve good performance.
21394e70 DM	246
21394e70 DM	247
07fef357	248	[[pve_ceph_pools]]
1d54c3b4 AA	249	Creating Ceph Pools
1d54c3b4 AA	250	-------------------
21394e70	251
8997dd6e DM	252	[thumbnail="gui-ceph-pools.png"]
8997dd6e DM	253
1d54c3b4 AA	254	A pool is a logical group for storing objects. It holds Placement
	255	Groups (PG), a collection of objects.
	256
	257	When no options are given, we set a
	258	default of 64 PGs, a size of 3 replicas and a min_size of 2 replicas
	259	for serving objects in a degraded state.
	260
	261	NOTE: The default number of PGs works for 2-6 disks. Ceph throws a
	262	"HEALTH_WARNING" if you have too few or too many PGs in your cluster.
	263
	264	It is advised to calculate the PG number depending on your setup, you can find
	265	the formula and the PG
	266	calculator footnote:[PG calculator http://ceph.com/pgcalc/] online. While PGs
	267	can be increased later on, they can never be decreased.
	268
	269
	270	You can create pools through command line or on the GUI on each PVE host under
	271	Ceph -> Pools.
	272
	273	[source,bash]
	274	----
	275	pveceph createpool <name>
	276	----
	277
	278	If you would like to automatically get also a storage definition for your pool,
	279	active the checkbox "Add storages" on the GUI or use the command line option
	280	'--add_storages' on pool creation.
21394e70	281
1d54c3b4 AA	282	Further information on Ceph pool handling can be found in the Ceph pool
	283	operation footnote:[Ceph pool operation
	284	http://docs.ceph.com/docs/luminous/rados/operations/pools/]
	285	manual.
21394e70	286
9fad507d AA	287	Ceph CRUSH & device classes
	288	---------------------------
	289	The foundation of Ceph is its algorithm, Controlled Replication
	290	Under Scalable Hashing
	291	(CRUSH footnote:[CRUSH https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf]).
	292
	293	CRUSH calculates where to store to and retrieve data from, this has the
	294	advantage that no central index service is needed. CRUSH works with a map of
	295	OSDs, buckets (device locations) and rulesets (data replication) for pools.
	296
	297	NOTE: Further information can be found in the Ceph documentation, under the
	298	section CRUSH map footnote:[CRUSH map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/].
	299
	300	This map can be altered to reflect different replication hierarchies. The object
	301	replicas can be separated (eg. failure domains), while maintaining the desired
	302	distribution.
	303
	304	A common use case is to use different classes of disks for different Ceph pools.
	305	For this reason, Ceph introduced the device classes with luminous, to
	306	accommodate the need for easy ruleset generation.
	307
	308	The device classes can be seen in the 'ceph osd tree' output. These classes
	309	represent their own root bucket, which can be seen with the below command.
	310
	311	[source, bash]
	312	----
	313	ceph osd crush tree --show-shadow
	314	----
	315
	316	Example output form the above command:
	317
	318	[source, bash]
	319	----
	320	ID CLASS WEIGHT TYPE NAME
	321	-16 nvme 2.18307 root default~nvme
	322	-13 nvme 0.72769 host sumi1~nvme
	323	12 nvme 0.72769 osd.12
	324	-14 nvme 0.72769 host sumi2~nvme
	325	13 nvme 0.72769 osd.13
	326	-15 nvme 0.72769 host sumi3~nvme
	327	14 nvme 0.72769 osd.14
	328	-1 7.70544 root default
	329	-3 2.56848 host sumi1
	330	12 nvme 0.72769 osd.12
	331	-5 2.56848 host sumi2
	332	13 nvme 0.72769 osd.13
	333	-7 2.56848 host sumi3
	334	14 nvme 0.72769 osd.14
	335	----
	336
	337	To let a pool distribute its objects only on a specific device class, you need
	338	to create a ruleset with the specific class first.
	339
	340	[source, bash]
	341	----
	342	ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
	343	----
	344
	345	[frame="none",grid="none", align="left", cols="30%,70%"]
	346	\|===
	347	\|<rule-name>\|name of the rule, to connect with a pool (seen in GUI & CLI)
	348	\|<root>\|which crush root it should belong to (default ceph root "default")
	349	\|<failure-domain>\|at which failure-domain the objects should be distributed (usually host)
	350	\|<class>\|what type of OSD backing store to use (eg. nvme, ssd, hdd)
351	\|===
352
353	Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.
354
355	[source, bash]
356	----
357	ceph osd pool set <pool-name> crush_rule <rule-name>
358	----
359
360	TIP: If the pool already contains objects, all of these have to be moved
361	accordingly. Depending on your setup this may introduce a big performance hit on
362	your cluster. As an alternative, you can create a new pool and move disks
363	separately.
364
365
21394e70 DM	366	Ceph Client
	367	-----------
	368
8997dd6e DM	369	[thumbnail="gui-ceph-log.png"]
8997dd6e DM	370
21394e70 DM	371	You can then configure {pve} to use such pools to store VM or
	372	Container images. Simply use the GUI too add a new `RBD` storage (see
	373	section xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
	374
1d54c3b4 AA	375	You also need to copy the keyring to a predefined location for a external Ceph
	376	cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
	377	done automatically.
21394e70 DM	378
	379	NOTE: The file name needs to be `<storage_id> + `.keyring` - `<storage_id>` is
	380	the expression after 'rbd:' in `/etc/pve/storage.cfg` which is
	381	`my-ceph-storage` in the following example:
	382
	383	[source,bash]
	384	----
	385	mkdir /etc/pve/priv/ceph
	386	cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring
	387	----
0840a663 DM	388
	389
	390	ifdef::manvolnum[]
	391	include::pve-copyright.adoc[]
	392	endif::manvolnum[]