]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rados/configuration/filesystem-recommendations.rst
6225dd379ad51238b50cabee3cdb205f3ba54148
[ceph.git] / ceph / doc / rados / configuration / filesystem-recommendations.rst
1 ===========================================
2 Hard Disk and File System Recommendations
3 ===========================================
4
5 .. index:: hard drive preparation
6
7 Hard Drive Prep
8 ===============
9
10 Ceph aims for data safety, which means that when the :term:`Ceph Client`
11 receives notice that data was written to a storage drive, that data was actually
12 written to the storage drive. For old kernels (<2.6.33), disable the write cache
13 if the journal is on a raw drive. Newer kernels should work fine.
14
15 Use ``hdparm`` to disable write caching on the hard disk::
16
17 sudo hdparm -W 0 /dev/hda 0
18
19 In production environments, we recommend running a :term:`Ceph OSD Daemon` with
20 separate drives for the operating system and the data. If you run data and an
21 operating system on a single disk, we recommend creating a separate partition
22 for your data.
23
24 .. index:: filesystems
25
26 Filesystems
27 ===========
28
29 Ceph OSD Daemons rely heavily upon the stability and performance of the
30 underlying filesystem.
31
32 Recommended
33 -----------
34
35 We currently recommend ``XFS`` for production deployments.
36
37 We used to recommend ``btrfs`` for testing, development, and any non-critical
38 deployments becuase it has the most promising set of features. However, we
39 now plan to avoid using a kernel file system entirely with the new BlueStore
40 backend. ``btrfs`` is still supported and has a comparatively compelling
41 set of features, but be mindful of its stability and support status in your
42 Linux distribution.
43
44 Not recommended
45 ---------------
46
47 We recommend *against* using ``ext4`` due to limitations in the size
48 of xattrs it can store, and the problems this causes with the way Ceph
49 handles long RADOS object names. Although these issues will generally
50 not surface with Ceph clusters using only short object names (e.g., an
51 RBD workload that does not include long RBD image names), other users
52 like RGW make extensive use of long object names and can break.
53
54 Starting with the Jewel release, the ``ceph-osd`` daemon will refuse
55 to start if the configured max object name cannot be safely stored on
56 ``ext4``. If the cluster is only being used with short object names
57 (e.g., RBD only), you can continue using ``ext4`` by setting the
58 following configuration option::
59
60 osd max object name len = 256
61 osd max object namespace len = 64
62
63 .. note:: This may result in difficult-to-diagnose errors if you try
64 to use RGW or other librados clients that do not properly
65 handle or politely surface any resulting ENAMETOOLONG
66 errors.
67
68
69 Filesystem Background Info
70 ==========================
71
72 The ``XFS``, ``btrfs`` and ``ext4`` file systems provide numerous
73 advantages in highly scaled data storage environments when `compared`_
74 to ``ext3``.
75
76 ``XFS``, ``btrfs`` and ``ext4`` are `journaling file systems`_, which means that
77 they are more robust when recovering from crashes, power outages, etc. These
78 filesystems journal all of the changes they will make before performing writes.
79
80 ``XFS`` was developed for Silicon Graphics, and is a mature and stable
81 filesystem. By contrast, ``btrfs`` is a relatively new file system that aims
82 to address the long-standing wishes of system administrators working with
83 large scale data storage environments. ``btrfs`` has some unique features
84 and advantages compared to other Linux filesystems.
85
86 ``btrfs`` is a `copy-on-write`_ filesystem. It supports file creation
87 timestamps and checksums that verify metadata integrity, so it can detect
88 bad copies of data and fix them with the good copies. The copy-on-write
89 capability means that ``btrfs`` can support snapshots that are writable.
90 ``btrfs`` supports transparent compression and other features.
91
92 ``btrfs`` also incorporates multi-device management into the file system,
93 which enables you to support heterogeneous disk storage infrastructure,
94 data allocation policies. The community also aims to provide ``fsck``,
95 deduplication, and data encryption support in the future.
96
97 .. _copy-on-write: http://en.wikipedia.org/wiki/Copy-on-write
98 .. _compared: http://en.wikipedia.org/wiki/Comparison_of_file_systems
99 .. _journaling file systems: http://en.wikipedia.org/wiki/Journaling_file_system