]>
Commit | Line | Data |
---|---|---|
d2e6a577 FG |
1 | ================= |
2 | Storage Devices | |
3 | ================= | |
4 | ||
5 | There are two Ceph daemons that store data on disk: | |
6 | ||
7 | * **Ceph OSDs** (or Object Storage Daemons) are where most of the | |
8 | data is stored in Ceph. Generally speaking, each OSD is backed by | |
9 | a single storage device, like a traditional hard disk (HDD) or | |
10 | solid state disk (SSD). OSDs can also be backed by a combination | |
11 | of devices, like a HDD for most data and an SSD (or partition of an | |
12 | SSD) for some metadata. The number of OSDs in a cluster is | |
13 | generally a function of how much data will be stored, how big each | |
14 | storage device will be, and the level and type of redundancy | |
15 | (replication or erasure coding). | |
16 | * **Ceph Monitor** daemons manage critical cluster state like cluster | |
17 | membership and authentication information. For smaller clusters a | |
18 | few gigabytes is all that is needed, although for larger clusters | |
19 | the monitor database can reach tens or possibly hundreds of | |
20 | gigabytes. | |
21 | ||
22 | ||
23 | OSD Backends | |
24 | ============ | |
25 | ||
26 | There are two ways that OSDs can manage the data they store. Starting | |
27 | with the Luminous 12.2.z release, the new default (and recommended) backend is | |
28 | *BlueStore*. Prior to Luminous, the default (and only option) was | |
29 | *FileStore*. | |
30 | ||
31 | BlueStore | |
32 | --------- | |
33 | ||
34 | BlueStore is a special-purpose storage backend designed specifically | |
35 | for managing data on disk for Ceph OSD workloads. It is motivated by | |
36 | experience supporting and managing OSDs using FileStore over the | |
37 | last ten years. Key BlueStore features include: | |
38 | ||
39 | * Direct management of storage devices. BlueStore consumes raw block | |
40 | devices or partitions. This avoids any intervening layers of | |
41 | abstraction (such as local file systems like XFS) that may limit | |
42 | performance or add complexity. | |
43 | * Metadata management with RocksDB. We embed RocksDB's key/value database | |
44 | in order to manage internal metadata, such as the mapping from object | |
45 | names to block locations on disk. | |
46 | * Full data and metadata checksumming. By default all data and | |
47 | metadata written to BlueStore is protected by one or more | |
48 | checksums. No data or metadata will be read from disk or returned | |
49 | to the user without being verified. | |
50 | * Inline compression. Data written may be optionally compressed | |
51 | before being written to disk. | |
52 | * Multi-device metadata tiering. BlueStore allows its internal | |
53 | journal (write-ahead log) to be written to a separate, high-speed | |
54 | device (like an SSD, NVMe, or NVDIMM) to increased performance. If | |
55 | a significant amount of faster storage is available, internal | |
56 | metadata can also be stored on the faster device. | |
57 | * Efficient copy-on-write. RBD and CephFS snapshots rely on a | |
58 | copy-on-write *clone* mechanism that is implemented efficiently in | |
59 | BlueStore. This results in efficient IO both for regular snapshots | |
60 | and for erasure coded pools (which rely on cloning to implement | |
61 | efficient two-phase commits). | |
62 | ||
63 | For more information, see :doc:`bluestore-config-ref`. | |
64 | ||
65 | FileStore | |
66 | --------- | |
67 | ||
68 | FileStore is the legacy approach to storing objects in Ceph. It | |
69 | relies on a standard file system (normally XFS) in combination with a | |
70 | key/value database (traditionally LevelDB, now RocksDB) for some | |
71 | metadata. | |
72 | ||
73 | FileStore is well-tested and widely used in production but suffers | |
74 | from many performance deficiencies due to its overall design and | |
75 | reliance on a traditional file system for storing object data. | |
76 | ||
77 | Although FileStore is generally capable of functioning on most | |
78 | POSIX-compatible file systems (including btrfs and ext4), we only | |
79 | recommend that XFS be used. Both btrfs and ext4 have known bugs and | |
80 | deficiencies and their use may lead to data loss. By default all Ceph | |
81 | provisioning tools will use XFS. | |
82 | ||
83 | For more information, see :doc:`filestore-config-ref`. |