]> git.proxmox.com Git - ceph.git/blame - ceph/doc/ceph-volume/lvm/prepare.rst
import new upstream nautilus stable release 14.2.8
[ceph.git] / ceph / doc / ceph-volume / lvm / prepare.rst
CommitLineData
b5b8bbf5
FG
1.. _ceph-volume-lvm-prepare:
2
3``prepare``
4===========
3efd9988
FG
5This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is
6recommended to pre-provision a logical volume before using it with
7``ceph-volume lvm``.
8
9Logical volumes are not altered except for adding extra metadata.
b5b8bbf5
FG
10
11.. note:: This is part of a two step process to deploy an OSD. If looking for
12 a single-call way, please see :ref:`ceph-volume-lvm-create`
13
14To help identify volumes, the process of preparing a volume (or volumes) to
15work with Ceph, the tool will assign a few pieces of metadata information using
16:term:`LVM tags`.
17
18:term:`LVM tags` makes volumes easy to discover later, and help identify them as
19part of a Ceph system, and what role they have (journal, filestore, bluestore,
20etc...)
21
22Although initially :term:`filestore` is supported (and supported by default)
23the back end can be specified with:
24
25
26* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
3efd9988 27* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
b5b8bbf5 28
92f5a8d4
TL
29.. _ceph-volume-lvm-prepare_bluestore:
30
31``bluestore``
32-------------
33The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
34more flexibility for devices compared to :term:`filestore`.
35Bluestore supports the following configurations:
36
37* A block device, a block.wal, and a block.db device
38* A block device and a block.wal device
39* A block device and a block.db device
40* A single block device
41
42The bluestore subcommand accepts physical block devices, partitions on
43physical block devices or logical volumes as arguments for the various device parameters
44If a physical device is provided, a logical volume will be created. A volume group will
45either be created or reused it its name begins with ``ceph``.
46This allows a simpler approach at using LVM but at the cost of flexibility:
47there are no options or configurations to change how the LV is created.
48
49The ``block`` is specified with the ``--data`` flag, and in its simplest use
50case it looks like::
51
52 ceph-volume lvm prepare --bluestore --data vg/lv
53
54A raw device can be specified in the same way::
55
56 ceph-volume lvm prepare --bluestore --data /path/to/device
57
58For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
59
60 ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
61
62If a ``block.db`` or a ``block.wal`` is needed (they are optional for
63bluestore) they can be specified with ``--block.db`` and ``--block.wal``
64accordingly. These can be a physical device, a partition or
65a logical volume.
66
67For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
68because they can be used as-is.
69
70While creating the OSD directory, the process will use a ``tmpfs`` mount to
71place all the files needed for the OSD. These files are initially created by
72``ceph-osd --mkfs`` and are fully ephemeral.
73
74A symlink is always created for the ``block`` device, and optionally for
75``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
76id of 0, the directory could look like::
77
78 # ls -l /var/lib/ceph/osd/ceph-0
79 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
80 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
81 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
82 -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
83 -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
84 -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
85 -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
86 -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
87 -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
88
89In the above case, a device was used for ``block`` so ``ceph-volume`` create
90a volume group and a logical volume using the following convention:
91
92* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
93 ``ceph-{random uuid}``
94
95* logical volume name: ``osd-block-{osd_fsid}``
96
97
b5b8bbf5
FG
98.. _ceph-volume-lvm-prepare_filestore:
99
100``filestore``
101-------------
3efd9988
FG
102This is the OSD backend that allows preparation of logical volumes for
103a :term:`filestore` objectstore OSD.
b5b8bbf5 104
92f5a8d4
TL
105It can use a logical volume for the OSD data and a physical device, a partition
106or logical volume for the journal. A physical device will have a logical volume
107created on it. A volume group will either be created or reused it its name begins
108with ``ceph``. No special preparation is needed for these volumes other than
109following the minimum size requirements for data and journal.
110
111The CLI call looks like this of a basic standalone filestore OSD::
112
113 ceph-volume lvm prepare --filestore --data <data block device>
b5b8bbf5 114
92f5a8d4 115To deploy file store with an external journal::
b5b8bbf5 116
92f5a8d4 117 ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
b5b8bbf5 118
28e407b8
AA
119For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
120
92f5a8d4 121 ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
28e407b8 122
92f5a8d4 123Both the journal and data block device can take three forms:
3efd9988 124
92f5a8d4
TL
125* a physical block device
126* a partition on a physical block device
127* a logical volume
3efd9988 128
92f5a8d4
TL
129When using logical volumes the value *must* be of the format
130``volume_group/logical_volume``. Since logical volume names
131are not enforced for uniqueness, this prevents accidentally
132choosing the wrong volume.
3efd9988 133
92f5a8d4
TL
134When using a partition, it *must* contain a ``PARTUUID``, that can be
135discovered by ``blkid``. THis ensure it can later be identified correctly
136regardless of the device name (or path).
137
138For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
139the journal::
3efd9988 140
11fdf7f2 141 ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
3efd9988 142
92f5a8d4 143Passing a bare device for data and a logical volume ias the journal::
3efd9988 144
92f5a8d4 145 ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
b5b8bbf5
FG
146
147A generated uuid is used to ask the cluster for a new OSD. These two pieces are
148crucial for identifying an OSD and will later be used throughout the
149:ref:`ceph-volume-lvm-activate` process.
150
151The OSD data directory is created using the following convention::
152
153 /var/lib/ceph/osd/<cluster name>-<osd id>
154
155At this point the data volume is mounted at this location, and the journal
156volume is linked::
157
158 ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
159
160The monmap is fetched using the bootstrap key from the OSD::
161
162 /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
163 --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
164 mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
165
166``ceph-osd`` will be called to populate the OSD directory, that is already
167mounted, re-using all the pieces of information from the initial steps::
168
169 ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \
170 --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
171 /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \
172 --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \
173 --setuser ceph --setgroup ceph
174
11fdf7f2
TL
175
176.. _ceph-volume-lvm-partitions:
177
178Partitioning
179------------
180``ceph-volume lvm`` does not currently create partitions from a whole device.
181If using device partitions the only requirement is that they contain the
182``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
183``parted`` will create that automatically for a new partition.
184
185For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
186use ``parted`` to create a new partition. First we list the device
187information::
188
189 $ parted --script /dev/sdd print
190 Model: VBOX HARDDISK (scsi)
191 Disk /dev/sdd: 11.5GB
192 Sector size (logical/physical): 512B/512B
193 Disk Flags:
194
195This device is not even labeled yet, so we can use ``parted`` to create
196a ``gpt`` label before we create a partition, and verify again with ``parted
197print``::
198
199 $ parted --script /dev/sdd mklabel gpt
200 $ parted --script /dev/sdd print
201 Model: VBOX HARDDISK (scsi)
202 Disk /dev/sdd: 11.5GB
203 Sector size (logical/physical): 512B/512B
204 Partition Table: gpt
205 Disk Flags:
206
207Now lets create a single partition, and verify later if ``blkid`` can find
208a ``PARTUUID`` that is needed by ``ceph-volume``::
209
210 $ parted --script /dev/sdd mkpart primary 1 100%
211 $ blkid /dev/sdd1
212 /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
213
214
b5b8bbf5
FG
215.. _ceph-volume-lvm-existing-osds:
216
217Existing OSDs
218-------------
219For existing clusters that want to use this new system and have OSDs that are
220already running there are a few things to take into account:
221
222.. warning:: this process will forcefully format the data device, destroying
223 existing data, if any.
224
225* OSD paths should follow this convention::
226
227 /var/lib/ceph/osd/<cluster name>-<osd id>
228
229* Preferably, no other mechanisms to mount the volume should exist, and should
230 be removed (like fstab mount points)
b5b8bbf5 231
11fdf7f2
TL
232The one time process for an existing OSD, with an ID of 0 and using
233a ``"ceph"`` cluster name would look like (the following command will **destroy
234any data** in the OSD)::
b5b8bbf5
FG
235
236 ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
237
238The command line tool will not contact the monitor to generate an OSD ID and
239will format the LVM device in addition to storing the metadata on it so that it
11fdf7f2
TL
240can be started later (for detailed metadata description see
241:ref:`ceph-volume-lvm-tags`).
b5b8bbf5
FG
242
243
b32b8144
FG
244Crush device class
245------------------
246
247To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will
248work for both bluestore and filestore OSDs::
249
250 ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
251
252
91327a77
AA
253.. _ceph-volume-lvm-multipath:
254
255``multipath`` support
256---------------------
257Devices that come from ``multipath`` are not supported as-is. The tool will
258refuse to consume a raw multipath device and will report a message like::
259
260 --> RuntimeError: Cannot use device (/dev/mapper/<name>). A vg/lv path or an existing device is needed
261
262The reason for not supporting multipath is that depending on the type of the
263multipath setup, if using an active/passive array as the underlying physical
264devices, filters are required in ``lvm.conf`` to exclude the disks that are part of
265those underlying devices.
266
267It is unfeasible for ceph-volume to understand what type of configuration is
268needed for LVM to be able to work in various different multipath scenarios. The
269functionality to create the LV for you is merely a (naive) convenience,
270anything that involves different settings or configuration must be provided by
271a config management system which can then provide VGs and LVs for ceph-volume
272to consume.
273
274This situation will only arise when trying to use the ceph-volume functionality
275that creates a volume group and logical volume from a device. If a multipath
276device is already a logical volume it *should* work, given that the LVM
277configuration is done correctly to avoid issues.
278
279
b5b8bbf5
FG
280Storing metadata
281----------------
3efd9988
FG
282The following tags will get applied as part of the preparation process
283regardless of the type of volume (journal or data) or OSD objectstore:
b5b8bbf5
FG
284
285* ``cluster_fsid``
b5b8bbf5
FG
286* ``encrypted``
287* ``osd_fsid``
288* ``osd_id``
b32b8144 289* ``crush_device_class``
3efd9988
FG
290
291For :term:`filestore` these tags will be added:
292
293* ``journal_device``
294* ``journal_uuid``
295
296For :term:`bluestore` these tags will be added:
297
298* ``block_device``
299* ``block_uuid``
300* ``db_device``
301* ``db_uuid``
302* ``wal_device``
303* ``wal_uuid``
b5b8bbf5
FG
304
305.. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
306
307
308Summary
309-------
3efd9988
FG
310To recap the ``prepare`` process for :term:`bluestore`:
311
92f5a8d4
TL
312#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
313#. Creates logical volumes on any raw physical devices.
3efd9988
FG
314#. Generate a UUID for the OSD
315#. Ask the monitor get an OSD ID reusing the generated UUID
316#. OSD data directory is created on a tmpfs mount.
317#. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
318#. monmap is fetched for activation
319#. Data directory is populated by ``ceph-osd``
11fdf7f2 320#. Logical Volumes are assigned all the Ceph metadata using lvm tags
3efd9988
FG
321
322
323And the ``prepare`` process for :term:`filestore`:
b5b8bbf5 324
92f5a8d4 325#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
b5b8bbf5
FG
326#. Generate a UUID for the OSD
327#. Ask the monitor get an OSD ID reusing the generated UUID
328#. OSD data directory is created and data volume mounted
329#. Journal is symlinked from data volume to journal location
330#. monmap is fetched for activation
331#. devices is mounted and data directory is populated by ``ceph-osd``
332#. data and journal volumes are assigned all the Ceph metadata using lvm tags