]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/configuration/storage-devices.rst
update sources to v12.1.3
[ceph.git] / ceph / doc / rados / configuration / storage-devices.rst
CommitLineData
d2e6a577
FG
1=================
2 Storage Devices
3=================
4
5There are two Ceph daemons that store data on disk:
6
7* **Ceph OSDs** (or Object Storage Daemons) are where most of the
8 data is stored in Ceph. Generally speaking, each OSD is backed by
9 a single storage device, like a traditional hard disk (HDD) or
10 solid state disk (SSD). OSDs can also be backed by a combination
11 of devices, like a HDD for most data and an SSD (or partition of an
12 SSD) for some metadata. The number of OSDs in a cluster is
13 generally a function of how much data will be stored, how big each
14 storage device will be, and the level and type of redundancy
15 (replication or erasure coding).
16* **Ceph Monitor** daemons manage critical cluster state like cluster
17 membership and authentication information. For smaller clusters a
18 few gigabytes is all that is needed, although for larger clusters
19 the monitor database can reach tens or possibly hundreds of
20 gigabytes.
21
22
23OSD Backends
24============
25
26There are two ways that OSDs can manage the data they store. Starting
27with the Luminous 12.2.z release, the new default (and recommended) backend is
28*BlueStore*. Prior to Luminous, the default (and only option) was
29*FileStore*.
30
31BlueStore
32---------
33
34BlueStore is a special-purpose storage backend designed specifically
35for managing data on disk for Ceph OSD workloads. It is motivated by
36experience supporting and managing OSDs using FileStore over the
37last ten years. Key BlueStore features include:
38
39* Direct management of storage devices. BlueStore consumes raw block
40 devices or partitions. This avoids any intervening layers of
41 abstraction (such as local file systems like XFS) that may limit
42 performance or add complexity.
43* Metadata management with RocksDB. We embed RocksDB's key/value database
44 in order to manage internal metadata, such as the mapping from object
45 names to block locations on disk.
46* Full data and metadata checksumming. By default all data and
47 metadata written to BlueStore is protected by one or more
48 checksums. No data or metadata will be read from disk or returned
49 to the user without being verified.
50* Inline compression. Data written may be optionally compressed
51 before being written to disk.
52* Multi-device metadata tiering. BlueStore allows its internal
53 journal (write-ahead log) to be written to a separate, high-speed
54 device (like an SSD, NVMe, or NVDIMM) to increased performance. If
55 a significant amount of faster storage is available, internal
56 metadata can also be stored on the faster device.
57* Efficient copy-on-write. RBD and CephFS snapshots rely on a
58 copy-on-write *clone* mechanism that is implemented efficiently in
59 BlueStore. This results in efficient IO both for regular snapshots
60 and for erasure coded pools (which rely on cloning to implement
61 efficient two-phase commits).
62
63For more information, see :doc:`bluestore-config-ref`.
64
65FileStore
66---------
67
68FileStore is the legacy approach to storing objects in Ceph. It
69relies on a standard file system (normally XFS) in combination with a
70key/value database (traditionally LevelDB, now RocksDB) for some
71metadata.
72
73FileStore is well-tested and widely used in production but suffers
74from many performance deficiencies due to its overall design and
75reliance on a traditional file system for storing object data.
76
77Although FileStore is generally capable of functioning on most
78POSIX-compatible file systems (including btrfs and ext4), we only
79recommend that XFS be used. Both btrfs and ext4 have known bugs and
80deficiencies and their use may lead to data loss. By default all Ceph
81provisioning tools will use XFS.
82
83For more information, see :doc:`filestore-config-ref`.