]> git.proxmox.com Git - ceph.git/blob - ceph/doc/ceph-volume/intro.rst
update ceph source to reef 18.1.2
[ceph.git] / ceph / doc / ceph-volume / intro.rst
1 .. _ceph-volume-overview:
2
3 Overview
4 --------
5 The ``ceph-volume`` tool aims to be a single purpose command line tool to deploy
6 logical volumes as OSDs, trying to maintain a similar API to ``ceph-disk`` when
7 preparing, activating, and creating OSDs.
8
9 It deviates from ``ceph-disk`` by not interacting or relying on the udev rules
10 that come installed for Ceph. These rules allow automatic detection of
11 previously setup devices that are in turn fed into ``ceph-disk`` to activate
12 them.
13
14 .. _ceph-disk-replaced:
15
16 Replacing ``ceph-disk``
17 -----------------------
18 The ``ceph-disk`` tool was created at a time when the project was required to
19 support many different types of init systems (upstart, sysvinit, etc...) while
20 being able to discover devices. This caused the tool to concentrate initially
21 (and exclusively afterwards) on GPT partitions. Specifically on GPT GUIDs,
22 which were used to label devices in a unique way to answer questions like:
23
24 * is this device a Journal?
25 * an encrypted data partition?
26 * was the device left partially prepared?
27
28 To solve these, it used ``UDEV`` rules to match the GUIDs, that would call
29 ``ceph-disk``, and end up in a back and forth between the ``ceph-disk`` systemd
30 unit and the ``ceph-disk`` executable. The process was very unreliable and time
31 consuming (a timeout of close to three hours **per OSD** had to be put in
32 place), and would cause OSDs to not come up at all during the boot process of
33 a node.
34
35 It was hard to debug, or even replicate these problems given the asynchronous
36 behavior of ``UDEV``.
37
38 Since the world-view of ``ceph-disk`` had to be GPT partitions exclusively, it meant
39 that it couldn't work with other technologies like LVM, or similar device
40 mapper devices. It was ultimately decided to create something modular, starting
41 with LVM support, and the ability to expand on other technologies as needed.
42
43
44 GPT partitions are simple?
45 --------------------------
46 Although partitions in general are simple to reason about, ``ceph-disk``
47 partitions were not simple by any means. It required a tremendous amount of
48 special flags in order to get them to work correctly with the device discovery
49 workflow. Here is an example call to create a data partition::
50
51 /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f0fc39fd-eeb2-49f1-b922-a11939cf8a0f --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb
52
53 Not only creating these was hard, but these partitions required devices to be
54 exclusively owned by Ceph. For example, in some cases a special partition would
55 be created when devices were encrypted, which would contain unencrypted keys.
56 This was ``ceph-disk`` domain knowledge, which would not translate to a "GPT
57 partitions are simple" understanding. Here is an example of that special
58 partition being created::
59
60 /sbin/sgdisk --new=5:0:+10M --change-name=5:ceph lockbox --partition-guid=5:None --typecode=5:fb3aabf9-d25f-47cc-bf5e-721d181642be --mbrtogpt -- /dev/sdad
61
62
63 Modularity
64 ----------
65 ``ceph-volume`` was designed to be a modular tool because we anticipate that
66 there are going to be lots of ways that people provision the hardware devices
67 that we need to consider. There are already two: legacy ceph-disk devices that
68 are still in use and have GPT partitions (handled by :ref:`ceph-volume-simple`),
69 and lvm. SPDK devices where we manage NVMe devices directly from userspace are
70 on the immediate horizon, where LVM won't work there since the kernel isn't
71 involved at all.
72
73 ``ceph-volume lvm``
74 -------------------
75 By making use of :term:`LVM tags`, the :ref:`ceph-volume-lvm` sub-command is
76 able to store and later re-discover and query devices associated with OSDs so
77 that they can later be activated.
78
79 LVM performance penalty
80 -----------------------
81 In short: we haven't been able to notice any significant performance penalties
82 associated with the change to LVM. By being able to work closely with LVM, the
83 ability to work with other device mapper technologies was a given: there is no
84 technical difficulty in working with anything that can sit below a Logical Volume.