]> git.proxmox.com Git - pve-docs.git/blame - pveceph.adoc
ceph: add more details for private/public network usage and distinction
[pve-docs.git] / pveceph.adoc
CommitLineData
80c0adcb 1[[chapter_pveceph]]
0840a663 2ifdef::manvolnum[]
b2f242ab
DM
3pveceph(1)
4==========
404a158e 5:pve-toplevel:
0840a663
DM
6
7NAME
8----
9
21394e70 10pveceph - Manage Ceph Services on Proxmox VE Nodes
0840a663 11
49a5e11c 12SYNOPSIS
0840a663
DM
13--------
14
15include::pveceph.1-synopsis.adoc[]
16
17DESCRIPTION
18-----------
19endif::manvolnum[]
0840a663 20ifndef::manvolnum[]
4bfe3e35
TL
21Deploy Hyper-Converged Ceph Cluster
22===================================
49d3ad91 23:pve-toplevel:
3885be3b
TL
24
25Introduction
26------------
0840a663
DM
27endif::manvolnum[]
28
94d7a98c 29[thumbnail="screenshot/gui-ceph-status-dashboard.png"]
8997dd6e 30
40e6c806 31{pve} unifies your compute and storage systems, that is, you can use the same
a474ca1f
AA
32physical nodes within a cluster for both computing (processing VMs and
33containers) and replicated storage. The traditional silos of compute and
34storage resources can be wrapped up into a single hyper-converged appliance.
40e6c806 35Separate storage networks (SANs) and connections via network attached storage
a474ca1f
AA
36(NAS) disappear. With the integration of Ceph, an open source software-defined
37storage platform, {pve} has the ability to run and manage Ceph storage directly
38on the hypervisor nodes.
c994e4e5
DM
39
40Ceph is a distributed object store and file system designed to provide
1d54c3b4
AA
41excellent performance, reliability and scalability.
42
04ba9b24 43.Some advantages of Ceph on {pve} are:
40e6c806 44- Easy setup and management via CLI and GUI
a474ca1f 45- Thin provisioning
40e6c806 46- Snapshot support
a474ca1f 47- Self healing
a474ca1f 48- Scalable to the exabyte level
3885be3b 49- Provides block, file system, and object storage
a474ca1f
AA
50- Setup pools with different performance and redundancy characteristics
51- Data is replicated, making it fault tolerant
40e6c806 52- Runs on commodity hardware
a474ca1f 53- No need for hardware RAID controllers
a474ca1f
AA
54- Open source
55
3885be3b
TL
56For small to medium-sized deployments, it is possible to install a Ceph server
57for using RADOS Block Devices (RBD) or CephFS directly on your {pve} cluster
58nodes (see xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
59Recent hardware has a lot of CPU power and RAM, so running storage services and
60virtual guests on the same node is possible.
21394e70 61
3885be3b
TL
62To simplify management, {pve} provides you native integration to install and
63manage {ceph} services on {pve} nodes either via the built-in web interface, or
64using the 'pveceph' command line tool.
21394e70 65
3885be3b
TL
66
67Terminology
68-----------
69
70// TODO: extend and also describe basic architecture here.
40e6c806 71.Ceph consists of multiple Daemons, for use as an RBD storage:
3885be3b
TL
72- Ceph Monitor (ceph-mon, or MON)
73- Ceph Manager (ceph-mgr, or MGS)
74- Ceph Metadata Service (ceph-mds, or MDS)
75- Ceph Object Storage Daemon (ceph-osd, or OSD)
1d54c3b4 76
d241b01b 77TIP: We highly recommend to get familiar with Ceph
b46a49ed 78footnote:[Ceph intro {cephdocs-url}/start/intro/],
d241b01b 79its architecture
b46a49ed 80footnote:[Ceph architecture {cephdocs-url}/architecture/]
477fbcfb 81and vocabulary
b46a49ed 82footnote:[Ceph glossary {cephdocs-url}/glossary].
1d54c3b4 83
21394e70 84
3885be3b
TL
85Recommendations for a Healthy Ceph Cluster
86------------------------------------------
21394e70 87
3885be3b
TL
88To build a hyper-converged Proxmox + Ceph Cluster, you must use at least three
89(preferably) identical servers for the setup.
21394e70
DM
90
91Check also the recommendations from
b46a49ed 92{cephdocs-url}/start/hardware-recommendations/[Ceph's website].
21394e70 93
3885be3b
TL
94NOTE: The recommendations below should be seen as a rough guidance for choosing
95hardware. Therefore, it is still essential to adapt it to your specific needs.
96You should test your setup and monitor health and performance continuously.
97
76f6eca4 98.CPU
3885be3b
TL
99Ceph services can be classified into two categories:
100* Intensive CPU usage, benefiting from high CPU base frequencies and multiple
101 cores. Members of that category are:
102** Object Storage Daemon (OSD) services
103** Meta Data Service (MDS) used for CephFS
104* Moderate CPU usage, not needing multiple CPU cores. These are:
105** Monitor (MON) services
106** Manager (MGR) services
107
108As a simple rule of thumb, you should assign at least one CPU core (or thread)
109to each Ceph service to provide the minimum resources required for stable and
110durable Ceph performance.
111
112For example, if you plan to run a Ceph monitor, a Ceph manager and 6 Ceph OSDs
113services on a node you should reserve 8 CPU cores purely for Ceph when targeting
114basic and stable performance.
115
116Note that OSDs CPU usage depend mostly from the disks performance. The higher
117the possible IOPS (**IO** **O**perations per **S**econd) of a disk, the more CPU
118can be utilized by a OSD service.
119For modern enterprise SSD disks, like NVMe's that can permanently sustain a high
120IOPS load over 100'000 with sub millisecond latency, each OSD can use multiple
121CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is
122likely for very high performance disks.
76f6eca4
AA
123
124.Memory
125Especially in a hyper-converged setup, the memory consumption needs to be
3885be3b
TL
126carefully planned out and monitored. In addition to the predicted memory usage
127of virtual machines and containers, you must also account for having enough
128memory available for Ceph to provide excellent and stable performance.
5b502340
AA
129
130As a rule of thumb, for roughly **1 TiB of data, 1 GiB of memory** will be used
3885be3b
TL
131by an OSD. While the usage might be less under normal conditions, it will use
132most during critical operations like recovery, re-balancing or backfilling.
133That means that you should avoid maxing out your available memory already on
134normal operation, but rather leave some headroom to cope with outages.
5b502340 135
3885be3b
TL
136The OSD service itself will use additional memory. The Ceph BlueStore backend of
137the daemon requires by default **3-5 GiB of memory**, b (adjustable).
76f6eca4
AA
138
139.Network
3885be3b
TL
140We recommend a network bandwidth of at least 10 Gbps, or more, to be used
141exclusively for Ceph traffic. A meshed network setup
76f6eca4 142footnote:[Full Mesh Network for Ceph {webwiki-url}Full_Mesh_Network_for_Ceph_Server]
3885be3b
TL
143is also an option for three to five node clusters, if there are no 10+ Gbps
144switches available.
145
146[IMPORTANT]
147The volume of traffic, especially during recovery, will interfere
148with other services on the same network, especially the latency sensitive {pve}
149corosync cluster stack can be affected, resulting in possible loss of cluster
150quorum. Moving the Ceph traffic to dedicated and physical separated networks
151will avoid such interference, not only for corosync, but also for the networking
152services provided by any virtual guests.
153
154For estimating your bandwidth needs, you need to take the performance of your
155disks into account.. While a single HDD might not saturate a 1 Gb link, multiple
156HDD OSDs per node can already saturate 10 Gbps too.
157If modern NVMe-attached SSDs are used, a single one can already saturate 10 Gbps
158of bandwidth, or more. For such high-performance setups we recommend at least
159a 25 Gpbs, while even 40 Gbps or 100+ Gbps might be required to utilize the full
160performance potential of the underlying disks.
161
162If unsure, we recommend using three (physical) separate networks for
163high-performance setups:
164* one very high bandwidth (25+ Gbps) network for Ceph (internal) cluster
165 traffic.
166* one high bandwidth (10+ Gpbs) network for Ceph (public) traffic between the
167 ceph server and ceph client storage traffic. Depending on your needs this can
168 also be used to host the virtual guest traffic and the VM live-migration
169 traffic.
170* one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync
171 cluster communication.
76f6eca4
AA
172
173.Disks
174When planning the size of your Ceph cluster, it is important to take the
40e6c806 175recovery time into consideration. Especially with small clusters, recovery
76f6eca4
AA
176might take long. It is recommended that you use SSDs instead of HDDs in small
177setups to reduce recovery time, minimizing the likelihood of a subsequent
178failure event during recovery.
179
3a433e9b 180In general, SSDs will provide more IOPS than spinning disks. With this in mind,
40e6c806
DW
181in addition to the higher cost, it may make sense to implement a
182xref:pve_ceph_device_classes[class based] separation of pools. Another way to
183speed up OSDs is to use a faster disk as a journal or
513e2f57
TL
184DB/**W**rite-**A**head-**L**og device, see
185xref:pve_ceph_osds[creating Ceph OSDs].
186If a faster disk is used for multiple OSDs, a proper balance between OSD
40e6c806
DW
187and WAL / DB (or journal) disk must be selected, otherwise the faster disk
188becomes the bottleneck for all linked OSDs.
189
3885be3b
TL
190Aside from the disk type, Ceph performs best with an evenly sized, and an evenly
191distributed amount of disks per node. For example, 4 x 500 GB disks within each
192node is better than a mixed setup with a single 1 TB and three 250 GB disk.
2f19a6b0 193
40e6c806
DW
194You also need to balance OSD count and single OSD capacity. More capacity
195allows you to increase storage density, but it also means that a single OSD
196failure forces Ceph to recover more data at once.
76f6eca4 197
a474ca1f 198.Avoid RAID
86be506d 199As Ceph handles data object redundancy and multiple parallel writes to disks
c78756be 200(OSDs) on its own, using a RAID controller normally doesn’t improve
86be506d 201performance or availability. On the contrary, Ceph is designed to handle whole
40e6c806
DW
202disks on it's own, without any abstraction in between. RAID controllers are not
203designed for the Ceph workload and may complicate things and sometimes even
86be506d
TL
204reduce performance, as their write and caching algorithms may interfere with
205the ones from Ceph.
a474ca1f 206
40e6c806 207WARNING: Avoid RAID controllers. Use host bus adapter (HBA) instead.
a474ca1f 208
2394c306 209[[pve_ceph_install_wizard]]
40e6c806 210Initial Ceph Installation & Configuration
2394c306
TM
211-----------------------------------------
212
513e2f57
TL
213Using the Web-based Wizard
214~~~~~~~~~~~~~~~~~~~~~~~~~~
215
2394c306
TM
216[thumbnail="screenshot/gui-node-ceph-install.png"]
217
218With {pve} you have the benefit of an easy to use installation wizard
219for Ceph. Click on one of your cluster nodes and navigate to the Ceph
40e6c806
DW
220section in the menu tree. If Ceph is not already installed, you will see a
221prompt offering to do so.
2394c306 222
40e6c806 223The wizard is divided into multiple sections, where each needs to
513e2f57
TL
224finish successfully, in order to use Ceph.
225
226First you need to chose which Ceph version you want to install. Prefer the one
227from your other nodes, or the newest if this is the first node you install
228Ceph.
229
230After starting the installation, the wizard will download and install all the
231required packages from {pve}'s Ceph repository.
94d7a98c 232[thumbnail="screenshot/gui-node-ceph-install-wizard-step0.png"]
2394c306 233
513e2f57 234After finishing the installation step, you will need to create a configuration.
6a711e64
TL
235This step is only needed once per cluster, as this configuration is distributed
236automatically to all remaining cluster members through {pve}'s clustered
237xref:chapter_pmxcfs[configuration file system (pmxcfs)].
2394c306
TM
238
239The configuration step includes the following settings:
240
7367ba5b
TL
241[[pve_ceph_wizard_networks]]
242
243* *Public Network:* This network will be used for public storage communication
244 (e.g., for virtual machines using a Ceph RBD backed disk, or a CephFS mount).
245 This setting is required.
246 +
247 Separating your Ceph traffic from cluster communication, and possible the
248 front-facing (public) networks of your virtual gusts, is highly recommended.
249 Otherwise, Ceph's high-bandwidth IO-traffic could cause interference with
250 other low-latency dependent services.
2394c306
TM
251
252[thumbnail="screenshot/gui-node-ceph-install-wizard-step2.png"]
253
7367ba5b
TL
254* *Cluster Network:* Specify to separate the xref:pve_ceph_osds[OSD] replication
255 and heartbeat traffic as well.
256 +
257 Using a physically separated network is recommended, as it will relieve the
258 Ceph public and the virtual guests network, while also providing a significant
259 Ceph performance improvements.
2394c306 260
7367ba5b
TL
261You have two more options which are considered advanced and therefore should
262only changed if you know what you are doing.
2394c306 263
7367ba5b
TL
264* *Number of replicas*: Defines how often an object is replicated.
265* *Minimum replicas*: Defines the minimum number of required replicas for I/O to
266 be marked as complete.
2394c306 267
40e6c806 268Additionally, you need to choose your first monitor node. This step is required.
2394c306 269
40e6c806
DW
270That's it. You should now see a success page as the last step, with further
271instructions on how to proceed. Your system is now ready to start using Ceph.
272To get started, you will need to create some additional xref:pve_ceph_monitors[monitors],
273xref:pve_ceph_osds[OSDs] and at least one xref:pve_ceph_pools[pool].
2394c306 274
40e6c806
DW
275The rest of this chapter will guide you through getting the most out of
276your {pve} based Ceph setup. This includes the aforementioned tips and
277more, such as xref:pveceph_fs[CephFS], which is a helpful addition to your
2394c306 278new Ceph cluster.
21394e70 279
58f95dd7 280[[pve_ceph_install]]
513e2f57
TL
281CLI Installation of Ceph Packages
282~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
283
284Alternatively to the the recommended {pve} Ceph installation wizard available
285in the web-interface, you can use the following CLI command on each node:
21394e70
DM
286
287[source,bash]
288----
19920184 289pveceph install
21394e70
DM
290----
291
292This sets up an `apt` package repository in
293`/etc/apt/sources.list.d/ceph.list` and installs the required software.
294
295
513e2f57
TL
296Initial Ceph configuration via CLI
297~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8997dd6e 298
2394c306
TM
299Use the {pve} Ceph installation wizard (recommended) or run the
300following command on one node:
21394e70
DM
301
302[source,bash]
303----
304pveceph init --network 10.10.10.0/24
305----
306
2394c306 307This creates an initial configuration at `/etc/pve/ceph.conf` with a
40e6c806
DW
308dedicated network for Ceph. This file is automatically distributed to
309all {pve} nodes, using xref:chapter_pmxcfs[pmxcfs]. The command also
310creates a symbolic link at `/etc/ceph/ceph.conf`, which points to that file.
311Thus, you can simply run Ceph commands without the need to specify a
2394c306 312configuration file.
21394e70
DM
313
314
d9a27ee1 315[[pve_ceph_monitors]]
b3338e29
AA
316Ceph Monitor
317-----------
513e2f57
TL
318
319[thumbnail="screenshot/gui-ceph-monitor.png"]
320
1d54c3b4 321The Ceph Monitor (MON)
b46a49ed 322footnote:[Ceph Monitor {cephdocs-url}/start/intro/]
40e6c806
DW
323maintains a master copy of the cluster map. For high availability, you need at
324least 3 monitors. One monitor will already be installed if you
325used the installation wizard. You won't need more than 3 monitors, as long
326as your cluster is small to medium-sized. Only really large clusters will
327require more than this.
1d54c3b4 328
c998bdf2 329[[pveceph_create_mon]]
b3338e29
AA
330Create Monitors
331~~~~~~~~~~~~~~~
332
1d54c3b4 333On each node where you want to place a monitor (three monitors are recommended),
40e6c806 334create one by using the 'Ceph -> Monitor' tab in the GUI or run:
21394e70
DM
335
336
337[source,bash]
338----
d1fdb121 339pveceph mon create
21394e70
DM
340----
341
c998bdf2 342[[pveceph_destroy_mon]]
b3338e29
AA
343Destroy Monitors
344~~~~~~~~~~~~~~~~
0e38a564 345
40e6c806 346To remove a Ceph Monitor via the GUI, first select a node in the tree view and
0e38a564
AA
347go to the **Ceph -> Monitor** panel. Select the MON and click the **Destroy**
348button.
349
40e6c806 350To remove a Ceph Monitor via the CLI, first connect to the node on which the MON
0e38a564
AA
351is running. Then execute the following command:
352[source,bash]
353----
354pveceph mon destroy
355----
356
357NOTE: At least three Monitors are needed for quorum.
358
359
1d54c3b4 360[[pve_ceph_manager]]
b3338e29
AA
361Ceph Manager
362------------
40e6c806 363
b3338e29 364The Manager daemon runs alongside the monitors. It provides an interface to
40e6c806 365monitor the cluster. Since the release of Ceph luminous, at least one ceph-mgr
b46a49ed 366footnote:[Ceph Manager {cephdocs-url}/mgr/] daemon is
b3338e29
AA
367required.
368
55d634e6 369[[pveceph_create_mgr]]
b3338e29
AA
370Create Manager
371~~~~~~~~~~~~~~
1d54c3b4 372
40e6c806
DW
373Multiple Managers can be installed, but only one Manager is active at any given
374time.
1d54c3b4 375
1d54c3b4
AA
376[source,bash]
377----
d1fdb121 378pveceph mgr create
1d54c3b4
AA
379----
380
c1f38fe3
AA
381NOTE: It is recommended to install the Ceph Manager on the monitor nodes. For
382high availability install more then one manager.
383
21394e70 384
c998bdf2 385[[pveceph_destroy_mgr]]
b3338e29
AA
386Destroy Manager
387~~~~~~~~~~~~~~~
549350fe 388
40e6c806 389To remove a Ceph Manager via the GUI, first select a node in the tree view and
549350fe
AA
390go to the **Ceph -> Monitor** panel. Select the Manager and click the
391**Destroy** button.
392
40e6c806 393To remove a Ceph Monitor via the CLI, first connect to the node on which the
549350fe
AA
394Manager is running. Then execute the following command:
395[source,bash]
396----
397pveceph mgr destroy
398----
399
40e6c806
DW
400NOTE: While a manager is not a hard-dependency, it is crucial for a Ceph cluster,
401as it handles important features like PG-autoscaling, device health monitoring,
402telemetry and more.
549350fe 403
d9a27ee1 404[[pve_ceph_osds]]
b3338e29
AA
405Ceph OSDs
406---------
513e2f57
TL
407
408[thumbnail="screenshot/gui-ceph-osd-status.png"]
409
40e6c806 410Ceph **O**bject **S**torage **D**aemons store objects for Ceph over the
b3338e29
AA
411network. It is recommended to use one OSD per physical disk.
412
081cb761 413[[pve_ceph_osd_create]]
b3338e29
AA
414Create OSDs
415~~~~~~~~~~~
21394e70 416
40e6c806 417You can create an OSD either via the {pve} web-interface or via the CLI using
e79e0b9d 418`pveceph`. For example:
21394e70
DM
419
420[source,bash]
421----
d1fdb121 422pveceph osd create /dev/sd[X]
21394e70
DM
423----
424
40e6c806 425TIP: We recommend a Ceph cluster with at least three nodes and at least 12
e79e0b9d 426OSDs, evenly distributed among the nodes.
1d54c3b4 427
40e6c806
DW
428If the disk was in use before (for example, for ZFS or as an OSD) you first need
429to zap all traces of that usage. To remove the partition table, boot sector and
430any other OSD leftover, you can use the following command:
a474ca1f
AA
431
432[source,bash]
433----
9bddef40 434ceph-volume lvm zap /dev/sd[X] --destroy
a474ca1f
AA
435----
436
e79e0b9d 437WARNING: The above command will destroy all data on the disk!
1d54c3b4 438
b3338e29 439.Ceph Bluestore
21394e70 440
1d54c3b4 441Starting with the Ceph Kraken release, a new Ceph OSD storage type was
40e6c806 442introduced called Bluestore
2798d126 443footnote:[Ceph Bluestore https://ceph.com/community/new-luminous-bluestore/].
9bddef40 444This is the default when creating OSDs since Ceph Luminous.
21394e70
DM
445
446[source,bash]
447----
d1fdb121 448pveceph osd create /dev/sd[X]
1d54c3b4
AA
449----
450
1e834cb2 451.Block.db and block.wal
1d54c3b4
AA
452
453If you want to use a separate DB/WAL device for your OSDs, you can specify it
b3338e29
AA
454through the '-db_dev' and '-wal_dev' options. The WAL is placed with the DB, if
455not specified separately.
1d54c3b4
AA
456
457[source,bash]
458----
d1fdb121 459pveceph osd create /dev/sd[X] -db_dev /dev/sd[Y] -wal_dev /dev/sd[Z]
1d54c3b4
AA
460----
461
40e6c806
DW
462You can directly choose the size of those with the '-db_size' and '-wal_size'
463parameters respectively. If they are not given, the following values (in order)
9bddef40
DC
464will be used:
465
40e6c806 466* bluestore_block_{db,wal}_size from Ceph configuration...
352c803f
TL
467** ... database, section 'osd'
468** ... database, section 'global'
469** ... file, section 'osd'
470** ... file, section 'global'
9bddef40
DC
471* 10% (DB)/1% (WAL) of OSD size
472
40e6c806 473NOTE: The DB stores BlueStore’s internal metadata, and the WAL is BlueStore’s
ee4a0e96 474internal journal or write-ahead log. It is recommended to use a fast SSD or
1d54c3b4
AA
475NVRAM for better performance.
476
b3338e29 477.Ceph Filestore
9bddef40 478
40e6c806 479Before Ceph Luminous, Filestore was used as the default storage type for Ceph OSDs.
9bddef40 480Starting with Ceph Nautilus, {pve} does not support creating such OSDs with
352c803f
TL
481'pveceph' anymore. If you still want to create filestore OSDs, use
482'ceph-volume' directly.
1d54c3b4
AA
483
484[source,bash]
485----
9bddef40 486ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y]
21394e70
DM
487----
488
081cb761 489[[pve_ceph_osd_destroy]]
b3338e29
AA
490Destroy OSDs
491~~~~~~~~~~~~
be2d137e 492
40e6c806
DW
493To remove an OSD via the GUI, first select a {PVE} node in the tree view and go
494to the **Ceph -> OSD** panel. Then select the OSD to destroy and click the **OUT**
495button. Once the OSD status has changed from `in` to `out`, click the **STOP**
496button. Finally, after the status has changed from `up` to `down`, select
497**Destroy** from the `More` drop-down menu.
be2d137e
AA
498
499To remove an OSD via the CLI run the following commands.
40e6c806 500
be2d137e
AA
501[source,bash]
502----
503ceph osd out <ID>
504systemctl stop ceph-osd@<ID>.service
505----
40e6c806 506
be2d137e
AA
507NOTE: The first command instructs Ceph not to include the OSD in the data
508distribution. The second command stops the OSD service. Until this time, no
509data is lost.
510
511The following command destroys the OSD. Specify the '-cleanup' option to
512additionally destroy the partition table.
40e6c806 513
be2d137e
AA
514[source,bash]
515----
516pveceph osd destroy <ID>
517----
40e6c806
DW
518
519WARNING: The above command will destroy all data on the disk!
be2d137e
AA
520
521
07fef357 522[[pve_ceph_pools]]
b3338e29
AA
523Ceph Pools
524----------
94d7a98c
TL
525
526[thumbnail="screenshot/gui-ceph-pools.png"]
527
40e6c806
DW
528A pool is a logical group for storing objects. It holds a collection of objects,
529known as **P**lacement **G**roups (`PG`, `pg_num`).
1d54c3b4 530
b3338e29 531
6004d86b 532Create and Edit Pools
5b9f923f 533~~~~~~~~~~~~~~~~~~~~~
b3338e29 534
513e2f57
TL
535You can create and edit pools from the command line or the web-interface of any
536{pve} host under **Ceph -> Pools**.
d56606c7 537
90682f35 538When no options are given, we set a default of **128 PGs**, a **size of 3
d56606c7
TL
539replicas** and a **min_size of 2 replicas**, to ensure no data loss occurs if
540any OSD fails.
1d54c3b4 541
ef3efe51 542WARNING: **Do not set a min_size of 1**. A replicated pool with min_size of 1
40e6c806 543allows I/O on an object when it has only 1 replica, which could lead to data
ef3efe51
AA
544loss, incomplete PGs or unfound objects.
545
513e2f57
TL
546It is advised that you either enable the PG-Autoscaler or calculate the PG
547number based on your setup. You can find the formula and the PG calculator
f8bfcb41 548footnote:[PG calculator https://web.archive.org/web/20210301111112/http://ceph.com/pgcalc/] online. From Ceph Nautilus
513e2f57
TL
549onward, you can change the number of PGs
550footnoteref:[placement_groups,Placement Groups
c446b6bb 551{cephdocs-url}/rados/operations/placement-groups/] after the setup.
1d54c3b4 552
513e2f57 553The PG autoscaler footnoteref:[autoscaler,Automated Scaling
c446b6bb 554{cephdocs-url}/rados/operations/placement-groups/#automated-scaling] can
513e2f57
TL
555automatically scale the PG count for a pool in the background. Setting the
556`Target Size` or `Target Ratio` advanced parameters helps the PG-Autoscaler to
557make better decisions.
1d54c3b4 558
d56606c7 559.Example for creating a pool over the CLI
1d54c3b4
AA
560[source,bash]
561----
41791cf8 562pveceph pool create <pool-name> --add_storages
1d54c3b4
AA
563----
564
40e6c806
DW
565TIP: If you would also like to automatically define a storage for your
566pool, keep the `Add as Storage' checkbox checked in the web-interface, or use the
d56606c7 567command line option '--add_storages' at pool creation.
21394e70 568
513e2f57
TL
569Pool Options
570^^^^^^^^^^^^
571
94d7a98c
TL
572[thumbnail="screenshot/gui-ceph-pool-create.png"]
573
513e2f57
TL
574The following options are available on pool creation, and partially also when
575editing a pool.
576
c446b6bb
DW
577Name:: The name of the pool. This must be unique and can't be changed afterwards.
578Size:: The number of replicas per object. Ceph always tries to have this many
579copies of an object. Default: `3`.
580PG Autoscale Mode:: The automatic PG scaling mode footnoteref:[autoscaler] of
581the pool. If set to `warn`, it produces a warning message when a pool
582has a non-optimal PG count. Default: `warn`.
583Add as Storage:: Configure a VM or container storage using the new pool.
5b9f923f 584Default: `true` (only visible on creation).
c446b6bb
DW
585
586.Advanced Options
587Min. Size:: The minimum number of replicas per object. Ceph will reject I/O on
588the pool if a PG has less than this many replicas. Default: `2`.
589Crush Rule:: The rule to use for mapping object placement in the cluster. These
590rules define how data is placed within the cluster. See
591xref:pve_ceph_device_classes[Ceph CRUSH & device classes] for information on
592device-based rules.
593# of PGs:: The number of placement groups footnoteref:[placement_groups] that
594the pool should have at the beginning. Default: `128`.
513e2f57 595Target Ratio:: The ratio of data that is expected in the pool. The PG
c446b6bb
DW
596autoscaler uses the ratio relative to other ratio sets. It takes precedence
597over the `target size` if both are set.
a0d289ff
DC
598Target Size:: The estimated amount of data expected in the pool. The PG
599autoscaler uses this size to estimate the optimal PG count.
c446b6bb
DW
600Min. # of PGs:: The minimum number of placement groups. This setting is used to
601fine-tune the lower bound of the PG count for that pool. The PG autoscaler
602will not merge PGs below this threshold.
603
1d54c3b4
AA
604Further information on Ceph pool handling can be found in the Ceph pool
605operation footnote:[Ceph pool operation
b46a49ed 606{cephdocs-url}/rados/operations/pools/]
1d54c3b4 607manual.
21394e70 608
166c91fe 609
cbb265a3 610[[pve_ceph_ec_pools]]
41791cf8
TL
611Erasure Coded Pools
612~~~~~~~~~~~~~~~~~~~
cbb265a3 613
41791cf8
TL
614Erasure coding (EC) is a form of `forward error correction' codes that allows
615to recover from a certain amount of data loss. Erasure coded pools can offer
616more usable space compared to replicated pools, but they do that for the price
617of performance.
618
42135e58 619For comparison: in classic, replicated pools, multiple replicas of the data
41791cf8
TL
620are stored (`size`) while in erasure coded pool, data is split into `k` data
621chunks with additional `m` coding (checking) chunks. Those coding chunks can be
622used to recreate data should data chunks be missing.
623
624The number of coding chunks, `m`, defines how many OSDs can be lost without
625losing any data. The total amount of objects stored is `k + m`.
626
627Creating EC Pools
628^^^^^^^^^^^^^^^^^
629
42135e58
AL
630Erasure coded (EC) pools can be created with the `pveceph` CLI tooling.
631Planning an EC pool needs to account for the fact, that they work differently
632than replicated pools.
cbb265a3 633
e9d331c5
TL
634The default `min_size` of an EC pool depends on the `m` parameter. If `m = 1`,
635the `min_size` of the EC pool will be `k`. The `min_size` will be `k + 1` if
636`m > 1`. The Ceph documentation recommends a conservative `min_size` of `k + 2`
cbb265a3
AL
637footnote:[Ceph Erasure Coded Pool Recovery
638{cephdocs-url}/rados/operations/erasure-code/#erasure-coded-pool-recovery].
639
e9d331c5 640If there are less than `min_size` OSDs available, any IO to the pool will be
cbb265a3
AL
641blocked until there are enough OSDs available again.
642
e9d331c5 643NOTE: When planning an erasure coded pool, keep an eye on the `min_size` as it
cbb265a3
AL
644defines how many OSDs need to be available. Otherwise, IO will be blocked.
645
e9d331c5
TL
646For example, an EC pool with `k = 2` and `m = 1` will have `size = 3`,
647`min_size = 2` and will stay operational if one OSD fails. If the pool is
648configured with `k = 2`, `m = 2`, it will have a `size = 4` and `min_size = 3`
cbb265a3
AL
649and stay operational if one OSD is lost.
650
651To create a new EC pool, run the following command:
652
653[source,bash]
654----
81de7382 655pveceph pool create <pool-name> --erasure-coding k=2,m=1
cbb265a3
AL
656----
657
e9d331c5 658Optional parameters are `failure-domain` and `device-class`. If you
cbb265a3
AL
659need to change any EC profile settings used by the pool, you will have to
660create a new pool with a new profile.
661
662This will create a new EC pool plus the needed replicated pool to store the RBD
e9d331c5
TL
663omap and other metadata. In the end, there will be a `<pool name>-data` and
664`<pool name>-metada` pool. The default behavior is to create a matching storage
cbb265a3 665configuration as well. If that behavior is not wanted, you can disable it by
e9d331c5
TL
666providing the `--add_storages 0` parameter. When configuring the storage
667configuration manually, keep in mind that the `data-pool` parameter needs to be
cbb265a3
AL
668set. Only then will the EC pool be used to store the data objects. For example:
669
e9d331c5 670NOTE: The optional parameters `--size`, `--min_size` and `--crush_rule` will be
12730071 671used for the replicated metadata pool, but not for the erasure coded data pool.
e9d331c5
TL
672If you need to change the `min_size` on the data pool, you can do it later.
673The `size` and `crush_rule` parameters cannot be changed on erasure coded
12730071
AL
674pools.
675
cbb265a3
AL
676If there is a need to further customize the EC profile, you can do so by
677creating it with the Ceph tools directly footnote:[Ceph Erasure Code Profile
678{cephdocs-url}/rados/operations/erasure-code/#erasure-code-profiles], and
e9d331c5 679specify the profile to use with the `profile` parameter.
cbb265a3
AL
680
681For example:
682[source,bash]
683----
81de7382 684pveceph pool create <pool-name> --erasure-coding profile=<profile-name>
41791cf8
TL
685----
686
687Adding EC Pools as Storage
688^^^^^^^^^^^^^^^^^^^^^^^^^^
689
42135e58
AL
690You can add an already existing EC pool as storage to {pve}. It works the same
691way as adding an `RBD` pool but requires the extra `data-pool` option.
41791cf8
TL
692
693[source,bash]
694----
695pvesm add rbd <storage-name> --pool <replicated-pool> --data-pool <ec-pool>
cbb265a3
AL
696----
697
41791cf8 698TIP: Do not forget to add the `keyring` and `monhost` option for any external
f226da0e 699Ceph clusters, not managed by the local {pve} cluster.
cbb265a3 700
b3338e29
AA
701Destroy Pools
702~~~~~~~~~~~~~
166c91fe 703
40e6c806 704To destroy a pool via the GUI, select a node in the tree view and go to the
166c91fe 705**Ceph -> Pools** panel. Select the pool to destroy and click the **Destroy**
40e6c806 706button. To confirm the destruction of the pool, you need to enter the pool name.
166c91fe
AA
707
708Run the following command to destroy a pool. Specify the '-remove_storages' to
709also remove the associated storage.
40e6c806 710
166c91fe
AA
711[source,bash]
712----
713pveceph pool destroy <name>
714----
715
40e6c806
DW
716NOTE: Pool deletion runs in the background and can take some time.
717You will notice the data usage in the cluster decreasing throughout this
718process.
166c91fe 719
47d62c84
DW
720
721PG Autoscaler
722~~~~~~~~~~~~~
723
724The PG autoscaler allows the cluster to consider the amount of (expected) data
725stored in each pool and to choose the appropriate pg_num values automatically.
513e2f57 726It is available since Ceph Nautilus.
47d62c84
DW
727
728You may need to activate the PG autoscaler module before adjustments can take
729effect.
40e6c806 730
47d62c84
DW
731[source,bash]
732----
733ceph mgr module enable pg_autoscaler
734----
735
736The autoscaler is configured on a per pool basis and has the following modes:
737
738[horizontal]
739warn:: A health warning is issued if the suggested `pg_num` value differs too
740much from the current value.
741on:: The `pg_num` is adjusted automatically with no need for any manual
742interaction.
743off:: No automatic `pg_num` adjustments are made, and no warning will be issued
40e6c806 744if the PG count is not optimal.
47d62c84 745
40e6c806 746The scaling factor can be adjusted to facilitate future data storage with the
47d62c84
DW
747`target_size`, `target_size_ratio` and the `pg_num_min` options.
748
749WARNING: By default, the autoscaler considers tuning the PG count of a pool if
750it is off by a factor of 3. This will lead to a considerable shift in data
751placement and might introduce a high load on the cluster.
752
753You can find a more in-depth introduction to the PG autoscaler on Ceph's Blog -
754https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/[New in
755Nautilus: PG merging and autotuning].
756
757
76f6eca4 758[[pve_ceph_device_classes]]
9fad507d
AA
759Ceph CRUSH & device classes
760---------------------------
513e2f57
TL
761
762[thumbnail="screenshot/gui-ceph-config.png"]
763
40e6c806
DW
764The footnote:[CRUSH
765https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf] (**C**ontrolled
766**R**eplication **U**nder **S**calable **H**ashing) algorithm is at the
767foundation of Ceph.
9fad507d 768
40e6c806
DW
769CRUSH calculates where to store and retrieve data from. This has the
770advantage that no central indexing service is needed. CRUSH works using a map of
9fad507d
AA
771OSDs, buckets (device locations) and rulesets (data replication) for pools.
772
773NOTE: Further information can be found in the Ceph documentation, under the
b46a49ed 774section CRUSH map footnote:[CRUSH map {cephdocs-url}/rados/operations/crush-map/].
9fad507d
AA
775
776This map can be altered to reflect different replication hierarchies. The object
3a433e9b 777replicas can be separated (e.g., failure domains), while maintaining the desired
9fad507d
AA
778distribution.
779
40e6c806
DW
780A common configuration is to use different classes of disks for different Ceph
781pools. For this reason, Ceph introduced device classes with luminous, to
9fad507d
AA
782accommodate the need for easy ruleset generation.
783
784The device classes can be seen in the 'ceph osd tree' output. These classes
785represent their own root bucket, which can be seen with the below command.
786
787[source, bash]
788----
789ceph osd crush tree --show-shadow
790----
791
792Example output form the above command:
793
794[source, bash]
795----
796ID CLASS WEIGHT TYPE NAME
797-16 nvme 2.18307 root default~nvme
798-13 nvme 0.72769 host sumi1~nvme
799 12 nvme 0.72769 osd.12
800-14 nvme 0.72769 host sumi2~nvme
801 13 nvme 0.72769 osd.13
802-15 nvme 0.72769 host sumi3~nvme
803 14 nvme 0.72769 osd.14
804 -1 7.70544 root default
805 -3 2.56848 host sumi1
806 12 nvme 0.72769 osd.12
807 -5 2.56848 host sumi2
808 13 nvme 0.72769 osd.13
809 -7 2.56848 host sumi3
810 14 nvme 0.72769 osd.14
811----
812
40e6c806
DW
813To instruct a pool to only distribute objects on a specific device class, you
814first need to create a ruleset for the device class:
9fad507d
AA
815
816[source, bash]
817----
818ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
819----
820
821[frame="none",grid="none", align="left", cols="30%,70%"]
822|===
823|<rule-name>|name of the rule, to connect with a pool (seen in GUI & CLI)
f226da0e 824|<root>|which crush root it should belong to (default Ceph root "default")
9fad507d 825|<failure-domain>|at which failure-domain the objects should be distributed (usually host)
3a433e9b 826|<class>|what type of OSD backing store to use (e.g., nvme, ssd, hdd)
9fad507d
AA
827|===
828
829Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.
830
831[source, bash]
832----
833ceph osd pool set <pool-name> crush_rule <rule-name>
834----
835
40e6c806
DW
836TIP: If the pool already contains objects, these must be moved accordingly.
837Depending on your setup, this may introduce a big performance impact on your
838cluster. As an alternative, you can create a new pool and move disks separately.
9fad507d
AA
839
840
21394e70
DM
841Ceph Client
842-----------
843
1ff5e4e8 844[thumbnail="screenshot/gui-ceph-log.png"]
8997dd6e 845
40e6c806
DW
846Following the setup from the previous sections, you can configure {pve} to use
847such pools to store VM and Container images. Simply use the GUI to add a new
513e2f57
TL
848`RBD` storage (see section
849xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
21394e70 850
620d6725 851You also need to copy the keyring to a predefined location for an external Ceph
1d54c3b4
AA
852cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
853done automatically.
21394e70 854
40e6c806
DW
855NOTE: The filename needs to be `<storage_id> + `.keyring`, where `<storage_id>` is
856the expression after 'rbd:' in `/etc/pve/storage.cfg`. In the following example,
857`my-ceph-storage` is the `<storage_id>`:
21394e70
DM
858
859[source,bash]
860----
861mkdir /etc/pve/priv/ceph
862cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring
863----
0840a663 864
58f95dd7
TL
865[[pveceph_fs]]
866CephFS
867------
868
40e6c806
DW
869Ceph also provides a filesystem, which runs on top of the same object storage as
870RADOS block devices do. A **M**eta**d**ata **S**erver (`MDS`) is used to map the
871RADOS backed objects to files and directories, allowing Ceph to provide a
872POSIX-compliant, replicated filesystem. This allows you to easily configure a
873clustered, highly available, shared filesystem. Ceph's Metadata Servers
874guarantee that files are evenly distributed over the entire Ceph cluster. As a
875result, even cases of high load will not overwhelm a single host, which can be
876an issue with traditional shared filesystem approaches, for example `NFS`.
58f95dd7 877
1e834cb2
TL
878[thumbnail="screenshot/gui-node-ceph-cephfs-panel.png"]
879
40e6c806
DW
880{pve} supports both creating a hyper-converged CephFS and using an existing
881xref:storage_cephfs[CephFS as storage] to save backups, ISO files, and container
882templates.
58f95dd7
TL
883
884
885[[pveceph_fs_mds]]
886Metadata Server (MDS)
887~~~~~~~~~~~~~~~~~~~~~
888
40e6c806
DW
889CephFS needs at least one Metadata Server to be configured and running, in order
890to function. You can create an MDS through the {pve} web GUI's `Node
891-> CephFS` panel or from the command line with:
58f95dd7
TL
892
893----
894pveceph mds create
895----
896
40e6c806
DW
897Multiple metadata servers can be created in a cluster, but with the default
898settings, only one can be active at a time. If an MDS or its node becomes
58f95dd7 899unresponsive (or crashes), another `standby` MDS will get promoted to `active`.
40e6c806
DW
900You can speed up the handover between the active and standby MDS by using
901the 'hotstandby' parameter option on creation, or if you have already created it
58f95dd7
TL
902you may set/add:
903
904----
905mds standby replay = true
906----
907
40e6c806
DW
908in the respective MDS section of `/etc/pve/ceph.conf`. With this enabled, the
909specified MDS will remain in a `warm` state, polling the active one, so that it
910can take over faster in case of any issues.
911
912NOTE: This active polling will have an additional performance impact on your
913system and the active `MDS`.
58f95dd7 914
1e834cb2 915.Multiple Active MDS
58f95dd7 916
40e6c806
DW
917Since Luminous (12.2.x) you can have multiple active metadata servers
918running at once, but this is normally only useful if you have a high amount of
919clients running in parallel. Otherwise the `MDS` is rarely the bottleneck in a
920system. If you want to set this up, please refer to the Ceph documentation.
921footnote:[Configuring multiple active MDS daemons
922{cephdocs-url}/cephfs/multimds/]
58f95dd7
TL
923
924[[pveceph_fs_create]]
8a38333f
AA
925Create CephFS
926~~~~~~~~~~~~~
58f95dd7 927
40e6c806
DW
928With {pve}'s integration of CephFS, you can easily create a CephFS using the
929web interface, CLI or an external API interface. Some prerequisites are required
58f95dd7
TL
930for this to work:
931
932.Prerequisites for a successful CephFS setup:
40e6c806
DW
933- xref:pve_ceph_install[Install Ceph packages] - if this was already done some
934time ago, you may want to rerun it on an up-to-date system to
935ensure that all CephFS related packages get installed.
58f95dd7
TL
936- xref:pve_ceph_monitors[Setup Monitors]
937- xref:pve_ceph_monitors[Setup your OSDs]
938- xref:pveceph_fs_mds[Setup at least one MDS]
939
40e6c806 940After this is complete, you can simply create a CephFS through
58f95dd7 941either the Web GUI's `Node -> CephFS` panel or the command line tool `pveceph`,
40e6c806 942for example:
58f95dd7
TL
943
944----
945pveceph fs create --pg_num 128 --add-storage
946----
947
40e6c806
DW
948This creates a CephFS named 'cephfs', using a pool for its data named
949'cephfs_data' with '128' placement groups and a pool for its metadata named
950'cephfs_metadata' with one quarter of the data pool's placement groups (`32`).
58f95dd7 951Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
40e6c806 952Ceph documentation for more information regarding an appropriate placement group
c446b6bb 953number (`pg_num`) for your setup footnoteref:[placement_groups].
40e6c806 954Additionally, the '--add-storage' parameter will add the CephFS to the {pve}
c446b6bb 955storage configuration after it has been created successfully.
58f95dd7
TL
956
957Destroy CephFS
958~~~~~~~~~~~~~~
959
40e6c806 960WARNING: Destroying a CephFS will render all of its data unusable. This cannot be
58f95dd7
TL
961undone!
962
54f20853
TL
963To completely and gracefully remove a CephFS, the following steps are
964necessary:
58f95dd7 965
b631c35e
DC
966* Disconnect every non-{PVE} client (e.g. unmount the CephFS in guests).
967* Disable all related CephFS {PVE} storage entries (to prevent it from being
968 automatically mounted).
969* Remove all used resources from guests (e.g. ISOs) that are on the CephFS you
970 want to destroy.
971* Unmount the CephFS storages on all cluster nodes manually with
972+
58f95dd7 973----
b631c35e 974umount /mnt/pve/<STORAGE-NAME>
58f95dd7 975----
b631c35e
DC
976+
977Where `<STORAGE-NAME>` is the name of the CephFS storage in your {PVE}.
58f95dd7 978
b631c35e 979* Now make sure that no metadata server (`MDS`) is running for that CephFS,
54f20853
TL
980 either by stopping or destroying them. This can be done through the web
981 interface or via the command line interface, for the latter you would issue
982 the following command:
b631c35e
DC
983+
984----
985pveceph stop --service mds.NAME
58f95dd7 986----
b631c35e
DC
987+
988to stop them, or
989+
990----
991pveceph mds destroy NAME
58f95dd7 992----
b631c35e
DC
993+
994to destroy them.
995+
996Note that standby servers will automatically be promoted to active when an
997active `MDS` is stopped or removed, so it is best to first stop all standby
998servers.
58f95dd7 999
b631c35e
DC
1000* Now you can destroy the CephFS with
1001+
58f95dd7 1002----
b631c35e 1003pveceph fs destroy NAME --remove-storages --remove-pools
58f95dd7 1004----
b631c35e 1005+
f226da0e 1006This will automatically destroy the underlying Ceph pools as well as remove
b631c35e 1007the storages from pve config.
0840a663 1008
b631c35e
DC
1009After these steps, the CephFS should be completely removed and if you have
1010other CephFS instances, the stopped metadata servers can be started again
1011to act as standbys.
6ff32926 1012
081cb761
AA
1013Ceph maintenance
1014----------------
af6f59f4 1015
081cb761
AA
1016Replace OSDs
1017~~~~~~~~~~~~
af6f59f4 1018
40e6c806
DW
1019One of the most common maintenance tasks in Ceph is to replace the disk of an
1020OSD. If a disk is already in a failed state, then you can go ahead and run
1021through the steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate
1022those copies on the remaining OSDs if possible. This rebalancing will start as
1023soon as an OSD failure is detected or an OSD was actively stopped.
af6f59f4
TL
1024
1025NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
1026`size + 1` nodes are available. The reason for this is that the Ceph object
1027balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as
1028`failure domain'.
081cb761 1029
40e6c806 1030To replace a functioning disk from the GUI, go through the steps in
081cb761
AA
1031xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
1032the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.
1033
40e6c806
DW
1034On the command line, use the following commands:
1035
081cb761
AA
1036----
1037ceph osd out osd.<id>
1038----
1039
1040You can check with the command below if the OSD can be safely removed.
40e6c806 1041
081cb761
AA
1042----
1043ceph osd safe-to-destroy osd.<id>
1044----
1045
40e6c806
DW
1046Once the above check tells you that it is safe to remove the OSD, you can
1047continue with the following commands:
1048
081cb761
AA
1049----
1050systemctl stop ceph-osd@<id>.service
1051pveceph osd destroy <id>
1052----
1053
1054Replace the old disk with the new one and use the same procedure as described
1055in xref:pve_ceph_osd_create[Create OSDs].
1056
835f322d
TL
1057Trim/Discard
1058~~~~~~~~~~~~
40e6c806
DW
1059
1060It is good practice to run 'fstrim' (discard) regularly on VMs and containers.
081cb761 1061This releases data blocks that the filesystem isn’t using anymore. It reduces
c78cd2b6
AA
1062data usage and resource load. Most modern operating systems issue such discard
1063commands to their disks regularly. You only need to ensure that the Virtual
1064Machines enable the xref:qm_hard_disk_discard[disk discard option].
081cb761 1065
c998bdf2 1066[[pveceph_scrub]]
081cb761
AA
1067Scrub & Deep Scrub
1068~~~~~~~~~~~~~~~~~~
40e6c806 1069
081cb761
AA
1070Ceph ensures data integrity by 'scrubbing' placement groups. Ceph checks every
1071object in a PG for its health. There are two forms of Scrubbing, daily
b16f8c5f
TL
1072cheap metadata checks and weekly deep data checks. The weekly deep scrub reads
1073the objects and uses checksums to ensure data integrity. If a running scrub
1074interferes with business (performance) needs, you can adjust the time when
b46a49ed 1075scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-ref/#scrubbing]
081cb761
AA
1076are executed.
1077
1078
40e6c806 1079Ceph Monitoring and Troubleshooting
10df14fb 1080-----------------------------------
40e6c806
DW
1081
1082It is important to continuously monitor the health of a Ceph deployment from the
1083beginning, either by using the Ceph tools or by accessing
10df14fb 1084the status through the {pve} link:api-viewer/index.html[API].
6ff32926 1085
40e6c806 1086The following Ceph commands can be used to see if the cluster is healthy
10df14fb 1087('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
40e6c806 1088('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands
620d6725 1089below will also give you an overview of the current events and actions to take.
6ff32926
AA
1090
1091----
10df14fb
TL
1092# single time output
1093pve# ceph -s
1094# continuously output status changes (press CTRL+C to stop)
1095pve# ceph -w
6ff32926
AA
1096----
1097
40e6c806
DW
1098To get a more detailed view, every Ceph service has a log file under
1099`/var/log/ceph/`. If more detail is required, the log level can be
b46a49ed 1100adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
6ff32926
AA
1101
1102You can find more information about troubleshooting
b46a49ed 1103footnote:[Ceph troubleshooting {cephdocs-url}/rados/troubleshooting/]
620d6725 1104a Ceph cluster on the official website.
6ff32926
AA
1105
1106
0840a663
DM
1107ifdef::manvolnum[]
1108include::pve-copyright.adoc[]
1109endif::manvolnum[]