]>
Commit | Line | Data |
---|---|---|
d2e6a577 FG |
1 | ================= |
2 | Storage Devices | |
3 | ================= | |
4 | ||
20effc67 | 5 | There are several Ceph daemons in a storage cluster: |
d2e6a577 | 6 | |
20effc67 TL |
7 | * **Ceph OSDs** (Object Storage Daemons) store most of the data |
8 | in Ceph. Usually each OSD is backed by a single storage device. | |
9 | This can be a traditional hard disk (HDD) or a solid state disk | |
10 | (SSD). OSDs can also be backed by a combination of devices: for | |
11 | example, a HDD for most data and an SSD (or partition of an | |
12 | SSD) for some metadata. The number of OSDs in a cluster is | |
13 | usually a function of the amount of data to be stored, the size | |
14 | of each storage device, and the level and type of redundancy | |
15 | specified (replication or erasure coding). | |
16 | * **Ceph Monitor** daemons manage critical cluster state. This | |
17 | includes cluster membership and authentication information. | |
18 | Small clusters require only a few gigabytes of storage to hold | |
19 | the monitor database. In large clusters, however, the monitor | |
20 | database can reach sizes of tens of gigabytes to hundreds of | |
21 | gigabytes. | |
22 | * **Ceph Manager** daemons run alongside monitor daemons, providing | |
23 | additional monitoring and providing interfaces to external | |
24 | monitoring and management systems. | |
d2e6a577 FG |
25 | |
26 | ||
27 | OSD Backends | |
28 | ============ | |
29 | ||
20effc67 TL |
30 | There are two ways that OSDs manage the data they store. |
31 | As of the Luminous 12.2.z release, the default (and recommended) backend is | |
32 | *BlueStore*. Prior to the Luminous release, the default (and only option) was | |
f67539c2 | 33 | *Filestore*. |
d2e6a577 FG |
34 | |
35 | BlueStore | |
36 | --------- | |
37 | ||
20effc67 TL |
38 | BlueStore is a special-purpose storage backend designed specifically for |
39 | managing data on disk for Ceph OSD workloads. BlueStore's design is based on | |
40 | a decade of experience of supporting and managing Filestore OSDs. | |
d2e6a577 | 41 | |
20effc67 TL |
42 | Key BlueStore features include: |
43 | ||
44 | * Direct management of storage devices. BlueStore consumes raw block devices or | |
45 | partitions. This avoids intervening layers of abstraction (such as local file | |
46 | systems like XFS) that can limit performance or add complexity. | |
47 | * Metadata management with RocksDB. RocksDB's key/value database is embedded | |
48 | in order to manage internal metadata, including the mapping of object | |
d2e6a577 | 49 | names to block locations on disk. |
20effc67 | 50 | * Full data and metadata checksumming. By default, all data and |
d2e6a577 | 51 | metadata written to BlueStore is protected by one or more |
20effc67 | 52 | checksums. No data or metadata is read from disk or returned |
d2e6a577 | 53 | to the user without being verified. |
20effc67 TL |
54 | * Inline compression. Data can be optionally compressed before being written |
55 | to disk. | |
56 | * Multi-device metadata tiering. BlueStore allows its internal | |
d2e6a577 | 57 | journal (write-ahead log) to be written to a separate, high-speed |
20effc67 | 58 | device (like an SSD, NVMe, or NVDIMM) for increased performance. If |
d2e6a577 | 59 | a significant amount of faster storage is available, internal |
20effc67 TL |
60 | metadata can be stored on the faster device. |
61 | * Efficient copy-on-write. RBD and CephFS snapshots rely on a | |
d2e6a577 | 62 | copy-on-write *clone* mechanism that is implemented efficiently in |
20effc67 TL |
63 | BlueStore. This results in efficient I/O both for regular snapshots |
64 | and for erasure-coded pools (which rely on cloning to implement | |
d2e6a577 FG |
65 | efficient two-phase commits). |
66 | ||
94b18763 | 67 | For more information, see :doc:`bluestore-config-ref` and :doc:`/rados/operations/bluestore-migration`. |
d2e6a577 FG |
68 | |
69 | FileStore | |
70 | --------- | |
71 | ||
20effc67 | 72 | FileStore is the legacy approach to storing objects in Ceph. It |
d2e6a577 FG |
73 | relies on a standard file system (normally XFS) in combination with a |
74 | key/value database (traditionally LevelDB, now RocksDB) for some | |
75 | metadata. | |
76 | ||
20effc67 TL |
77 | FileStore is well-tested and widely used in production. However, it |
78 | suffers from many performance deficiencies due to its overall design | |
79 | and its reliance on a traditional file system for object data storage. | |
d2e6a577 | 80 | |
20effc67 TL |
81 | Although FileStore is capable of functioning on most POSIX-compatible |
82 | file systems (including btrfs and ext4), we recommend that only the | |
83 | XFS file system be used with Ceph. Both btrfs and ext4 have known bugs and | |
84 | deficiencies and their use may lead to data loss. By default, all Ceph | |
85 | provisioning tools use XFS. | |
d2e6a577 FG |
86 | |
87 | For more information, see :doc:`filestore-config-ref`. |