1 .. _ceph-volume-lvm-prepare:
5 This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is
6 recommended to pre-provision a logical volume before using it with
9 Logical volumes are not altered except for adding extra metadata.
11 .. note:: This is part of a two step process to deploy an OSD. If looking for
12 a single-call way, please see :ref:`ceph-volume-lvm-create`
14 To help identify volumes, the process of preparing a volume (or volumes) to
15 work with Ceph, the tool will assign a few pieces of metadata information using
18 :term:`LVM tags` makes volumes easy to discover later, and help identify them as
19 part of a Ceph system, and what role they have (journal, filestore, bluestore,
22 Although :term:`bluestore` is the default, the back end can be specified with:
25 * :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
26 * :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
28 .. _ceph-volume-lvm-prepare_bluestore:
32 The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
33 more flexibility for devices compared to :term:`filestore`.
34 Bluestore supports the following configurations:
36 * A block device, a block.wal, and a block.db device
37 * A block device and a block.wal device
38 * A block device and a block.db device
39 * A single block device
41 The bluestore subcommand accepts physical block devices, partitions on
42 physical block devices or logical volumes as arguments for the various device parameters
43 If a physical device is provided, a logical volume will be created. A volume group will
44 either be created or reused it its name begins with ``ceph``.
45 This allows a simpler approach at using LVM but at the cost of flexibility:
46 there are no options or configurations to change how the LV is created.
48 The ``block`` is specified with the ``--data`` flag, and in its simplest use
51 ceph-volume lvm prepare --bluestore --data vg/lv
53 A raw device can be specified in the same way::
55 ceph-volume lvm prepare --bluestore --data /path/to/device
57 For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
59 ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
61 If a ``block.db`` or a ``block.wal`` is needed (they are optional for
62 bluestore) they can be specified with ``--block.db`` and ``--block.wal``
63 accordingly. These can be a physical device, a partition or
66 For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
67 because they can be used as-is.
69 While creating the OSD directory, the process will use a ``tmpfs`` mount to
70 place all the files needed for the OSD. These files are initially created by
71 ``ceph-osd --mkfs`` and are fully ephemeral.
73 A symlink is always created for the ``block`` device, and optionally for
74 ``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
75 id of 0, the directory could look like::
77 # ls -l /var/lib/ceph/osd/ceph-0
78 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
79 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
80 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
81 -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
82 -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
83 -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
84 -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
85 -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
86 -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
88 In the above case, a device was used for ``block`` so ``ceph-volume`` create
89 a volume group and a logical volume using the following convention:
91 * volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
92 ``ceph-{random uuid}``
94 * logical volume name: ``osd-block-{osd_fsid}``
97 .. _ceph-volume-lvm-prepare_filestore:
101 This is the OSD backend that allows preparation of logical volumes for
102 a :term:`filestore` objectstore OSD.
104 It can use a logical volume for the OSD data and a physical device, a partition
105 or logical volume for the journal. A physical device will have a logical volume
106 created on it. A volume group will either be created or reused it its name begins
107 with ``ceph``. No special preparation is needed for these volumes other than
108 following the minimum size requirements for data and journal.
110 The CLI call looks like this of a basic standalone filestore OSD::
112 ceph-volume lvm prepare --filestore --data <data block device>
114 To deploy file store with an external journal::
116 ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
118 For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
120 ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
122 Both the journal and data block device can take three forms:
124 * a physical block device
125 * a partition on a physical block device
128 When using logical volumes the value *must* be of the format
129 ``volume_group/logical_volume``. Since logical volume names
130 are not enforced for uniqueness, this prevents accidentally
131 choosing the wrong volume.
133 When using a partition, it *must* contain a ``PARTUUID``, that can be
134 discovered by ``blkid``. THis ensure it can later be identified correctly
135 regardless of the device name (or path).
137 For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
140 ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
142 Passing a bare device for data and a logical volume ias the journal::
144 ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
146 A generated uuid is used to ask the cluster for a new OSD. These two pieces are
147 crucial for identifying an OSD and will later be used throughout the
148 :ref:`ceph-volume-lvm-activate` process.
150 The OSD data directory is created using the following convention::
152 /var/lib/ceph/osd/<cluster name>-<osd id>
154 At this point the data volume is mounted at this location, and the journal
157 ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
159 The monmap is fetched using the bootstrap key from the OSD::
161 /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
162 --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
163 mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
165 ``ceph-osd`` will be called to populate the OSD directory, that is already
166 mounted, re-using all the pieces of information from the initial steps::
168 ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \
169 --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
170 /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \
171 --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \
172 --setuser ceph --setgroup ceph
175 .. _ceph-volume-lvm-partitions:
179 ``ceph-volume lvm`` does not currently create partitions from a whole device.
180 If using device partitions the only requirement is that they contain the
181 ``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
182 ``parted`` will create that automatically for a new partition.
184 For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
185 use ``parted`` to create a new partition. First we list the device
188 $ parted --script /dev/sdd print
189 Model: VBOX HARDDISK (scsi)
190 Disk /dev/sdd: 11.5GB
191 Sector size (logical/physical): 512B/512B
194 This device is not even labeled yet, so we can use ``parted`` to create
195 a ``gpt`` label before we create a partition, and verify again with ``parted
198 $ parted --script /dev/sdd mklabel gpt
199 $ parted --script /dev/sdd print
200 Model: VBOX HARDDISK (scsi)
201 Disk /dev/sdd: 11.5GB
202 Sector size (logical/physical): 512B/512B
206 Now lets create a single partition, and verify later if ``blkid`` can find
207 a ``PARTUUID`` that is needed by ``ceph-volume``::
209 $ parted --script /dev/sdd mkpart primary 1 100%
211 /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
214 .. _ceph-volume-lvm-existing-osds:
218 For existing clusters that want to use this new system and have OSDs that are
219 already running there are a few things to take into account:
221 .. warning:: this process will forcefully format the data device, destroying
222 existing data, if any.
224 * OSD paths should follow this convention::
226 /var/lib/ceph/osd/<cluster name>-<osd id>
228 * Preferably, no other mechanisms to mount the volume should exist, and should
229 be removed (like fstab mount points)
231 The one time process for an existing OSD, with an ID of 0 and using
232 a ``"ceph"`` cluster name would look like (the following command will **destroy
233 any data** in the OSD)::
235 ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
237 The command line tool will not contact the monitor to generate an OSD ID and
238 will format the LVM device in addition to storing the metadata on it so that it
239 can be started later (for detailed metadata description see
240 :ref:`ceph-volume-lvm-tags`).
246 To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will
247 work for both bluestore and filestore OSDs::
249 ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
252 .. _ceph-volume-lvm-multipath:
254 ``multipath`` support
255 ---------------------
256 ``multipath`` devices are supported if ``lvm`` is configured properly.
260 Most Linux distributions should ship their LVM2 package with
261 ``multipath_component_detection = 1`` in the default configuration. With this
262 setting ``LVM`` ignores any device that is a multipath component and
263 ``ceph-volume`` will accordingly not touch these devices.
267 Should this setting be unavailable, a correct ``filter`` expression must be
268 provided in ``lvm.conf``. ``ceph-volume`` must not be able to use both the
269 multipath device and its multipath components.
273 The following tags will get applied as part of the preparation process
274 regardless of the type of volume (journal or data) or OSD objectstore:
280 * ``crush_device_class``
282 For :term:`filestore` these tags will be added:
287 For :term:`bluestore` these tags will be added:
296 .. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
301 To recap the ``prepare`` process for :term:`bluestore`:
303 #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
304 #. Creates logical volumes on any raw physical devices.
305 #. Generate a UUID for the OSD
306 #. Ask the monitor get an OSD ID reusing the generated UUID
307 #. OSD data directory is created on a tmpfs mount.
308 #. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
309 #. monmap is fetched for activation
310 #. Data directory is populated by ``ceph-osd``
311 #. Logical Volumes are assigned all the Ceph metadata using lvm tags
314 And the ``prepare`` process for :term:`filestore`:
316 #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
317 #. Generate a UUID for the OSD
318 #. Ask the monitor get an OSD ID reusing the generated UUID
319 #. OSD data directory is created and data volume mounted
320 #. Journal is symlinked from data volume to journal location
321 #. monmap is fetched for activation
322 #. devices is mounted and data directory is populated by ``ceph-osd``
323 #. data and journal volumes are assigned all the Ceph metadata using lvm tags