]> git.proxmox.com Git - ceph.git/blame - ceph/doc/ceph-volume/lvm/prepare.rst
Import ceph 15.2.8
[ceph.git] / ceph / doc / ceph-volume / lvm / prepare.rst
CommitLineData
b5b8bbf5
FG
1.. _ceph-volume-lvm-prepare:
2
3``prepare``
4===========
3efd9988
FG
5This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is
6recommended to pre-provision a logical volume before using it with
7``ceph-volume lvm``.
8
9Logical volumes are not altered except for adding extra metadata.
b5b8bbf5
FG
10
11.. note:: This is part of a two step process to deploy an OSD. If looking for
12 a single-call way, please see :ref:`ceph-volume-lvm-create`
13
14To help identify volumes, the process of preparing a volume (or volumes) to
15work with Ceph, the tool will assign a few pieces of metadata information using
16:term:`LVM tags`.
17
18:term:`LVM tags` makes volumes easy to discover later, and help identify them as
19part of a Ceph system, and what role they have (journal, filestore, bluestore,
20etc...)
21
9f95a23c 22Although :term:`bluestore` is the default, the back end can be specified with:
b5b8bbf5
FG
23
24
25* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
3efd9988 26* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
b5b8bbf5 27
92f5a8d4
TL
28.. _ceph-volume-lvm-prepare_bluestore:
29
30``bluestore``
31-------------
32The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
33more flexibility for devices compared to :term:`filestore`.
34Bluestore supports the following configurations:
35
36* A block device, a block.wal, and a block.db device
37* A block device and a block.wal device
38* A block device and a block.db device
39* A single block device
40
41The bluestore subcommand accepts physical block devices, partitions on
42physical block devices or logical volumes as arguments for the various device parameters
43If a physical device is provided, a logical volume will be created. A volume group will
44either be created or reused it its name begins with ``ceph``.
45This allows a simpler approach at using LVM but at the cost of flexibility:
46there are no options or configurations to change how the LV is created.
47
48The ``block`` is specified with the ``--data`` flag, and in its simplest use
49case it looks like::
50
51 ceph-volume lvm prepare --bluestore --data vg/lv
52
53A raw device can be specified in the same way::
54
55 ceph-volume lvm prepare --bluestore --data /path/to/device
56
57For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
58
59 ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
60
61If a ``block.db`` or a ``block.wal`` is needed (they are optional for
62bluestore) they can be specified with ``--block.db`` and ``--block.wal``
63accordingly. These can be a physical device, a partition or
64a logical volume.
65
66For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
67because they can be used as-is.
68
69While creating the OSD directory, the process will use a ``tmpfs`` mount to
70place all the files needed for the OSD. These files are initially created by
71``ceph-osd --mkfs`` and are fully ephemeral.
72
73A symlink is always created for the ``block`` device, and optionally for
74``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
75id of 0, the directory could look like::
76
77 # ls -l /var/lib/ceph/osd/ceph-0
78 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
79 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
80 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
81 -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
82 -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
83 -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
84 -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
85 -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
86 -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
87
88In the above case, a device was used for ``block`` so ``ceph-volume`` create
89a volume group and a logical volume using the following convention:
90
91* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
92 ``ceph-{random uuid}``
93
94* logical volume name: ``osd-block-{osd_fsid}``
95
96
b5b8bbf5
FG
97.. _ceph-volume-lvm-prepare_filestore:
98
99``filestore``
100-------------
3efd9988
FG
101This is the OSD backend that allows preparation of logical volumes for
102a :term:`filestore` objectstore OSD.
b5b8bbf5 103
92f5a8d4
TL
104It can use a logical volume for the OSD data and a physical device, a partition
105or logical volume for the journal. A physical device will have a logical volume
106created on it. A volume group will either be created or reused it its name begins
107with ``ceph``. No special preparation is needed for these volumes other than
108following the minimum size requirements for data and journal.
109
110The CLI call looks like this of a basic standalone filestore OSD::
111
112 ceph-volume lvm prepare --filestore --data <data block device>
b5b8bbf5 113
92f5a8d4 114To deploy file store with an external journal::
b5b8bbf5 115
92f5a8d4 116 ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
b5b8bbf5 117
28e407b8
AA
118For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
119
92f5a8d4 120 ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
28e407b8 121
92f5a8d4 122Both the journal and data block device can take three forms:
3efd9988 123
92f5a8d4
TL
124* a physical block device
125* a partition on a physical block device
126* a logical volume
3efd9988 127
92f5a8d4
TL
128When using logical volumes the value *must* be of the format
129``volume_group/logical_volume``. Since logical volume names
130are not enforced for uniqueness, this prevents accidentally
131choosing the wrong volume.
3efd9988 132
92f5a8d4
TL
133When using a partition, it *must* contain a ``PARTUUID``, that can be
134discovered by ``blkid``. THis ensure it can later be identified correctly
135regardless of the device name (or path).
136
137For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
138the journal::
3efd9988 139
11fdf7f2 140 ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
3efd9988 141
92f5a8d4 142Passing a bare device for data and a logical volume ias the journal::
3efd9988 143
92f5a8d4 144 ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
b5b8bbf5
FG
145
146A generated uuid is used to ask the cluster for a new OSD. These two pieces are
147crucial for identifying an OSD and will later be used throughout the
148:ref:`ceph-volume-lvm-activate` process.
149
150The OSD data directory is created using the following convention::
151
152 /var/lib/ceph/osd/<cluster name>-<osd id>
153
154At this point the data volume is mounted at this location, and the journal
155volume is linked::
156
157 ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
158
159The monmap is fetched using the bootstrap key from the OSD::
160
161 /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
162 --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
163 mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
164
165``ceph-osd`` will be called to populate the OSD directory, that is already
166mounted, re-using all the pieces of information from the initial steps::
167
168 ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \
169 --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
170 /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \
171 --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \
172 --setuser ceph --setgroup ceph
173
11fdf7f2
TL
174
175.. _ceph-volume-lvm-partitions:
176
177Partitioning
178------------
179``ceph-volume lvm`` does not currently create partitions from a whole device.
180If using device partitions the only requirement is that they contain the
181``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
182``parted`` will create that automatically for a new partition.
183
184For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
185use ``parted`` to create a new partition. First we list the device
186information::
187
188 $ parted --script /dev/sdd print
189 Model: VBOX HARDDISK (scsi)
190 Disk /dev/sdd: 11.5GB
191 Sector size (logical/physical): 512B/512B
192 Disk Flags:
193
194This device is not even labeled yet, so we can use ``parted`` to create
195a ``gpt`` label before we create a partition, and verify again with ``parted
196print``::
197
198 $ parted --script /dev/sdd mklabel gpt
199 $ parted --script /dev/sdd print
200 Model: VBOX HARDDISK (scsi)
201 Disk /dev/sdd: 11.5GB
202 Sector size (logical/physical): 512B/512B
203 Partition Table: gpt
204 Disk Flags:
205
206Now lets create a single partition, and verify later if ``blkid`` can find
207a ``PARTUUID`` that is needed by ``ceph-volume``::
208
209 $ parted --script /dev/sdd mkpart primary 1 100%
210 $ blkid /dev/sdd1
211 /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
212
213
b5b8bbf5
FG
214.. _ceph-volume-lvm-existing-osds:
215
216Existing OSDs
217-------------
218For existing clusters that want to use this new system and have OSDs that are
219already running there are a few things to take into account:
220
221.. warning:: this process will forcefully format the data device, destroying
222 existing data, if any.
223
224* OSD paths should follow this convention::
225
226 /var/lib/ceph/osd/<cluster name>-<osd id>
227
228* Preferably, no other mechanisms to mount the volume should exist, and should
229 be removed (like fstab mount points)
b5b8bbf5 230
11fdf7f2
TL
231The one time process for an existing OSD, with an ID of 0 and using
232a ``"ceph"`` cluster name would look like (the following command will **destroy
233any data** in the OSD)::
b5b8bbf5
FG
234
235 ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
236
237The command line tool will not contact the monitor to generate an OSD ID and
238will format the LVM device in addition to storing the metadata on it so that it
11fdf7f2
TL
239can be started later (for detailed metadata description see
240:ref:`ceph-volume-lvm-tags`).
b5b8bbf5
FG
241
242
b32b8144
FG
243Crush device class
244------------------
245
246To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will
247work for both bluestore and filestore OSDs::
248
249 ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
250
251
91327a77
AA
252.. _ceph-volume-lvm-multipath:
253
254``multipath`` support
255---------------------
f91f0fd5 256``multipath`` devices are support if ``lvm`` is configured properly.
91327a77 257
f91f0fd5 258**Leave it to LVM**
91327a77 259
f91f0fd5
TL
260Most Linux distributions should ship their LVM2 package with
261``multipath_component_detection = 1`` in the default configuration. With this
262setting ``LVM`` ignores any device that is a multipath component and
263``ceph-volume`` will accordingly not touch these devices.
91327a77 264
f91f0fd5 265**Using filters**
91327a77 266
f91f0fd5
TL
267Should this setting be unavailable, a correct ``filter`` expression must be
268provided in ``lvm.conf``. ``ceph-volume`` must not be able to use both the
269multipath device and its multipath components.
91327a77 270
b5b8bbf5
FG
271Storing metadata
272----------------
3efd9988
FG
273The following tags will get applied as part of the preparation process
274regardless of the type of volume (journal or data) or OSD objectstore:
b5b8bbf5
FG
275
276* ``cluster_fsid``
b5b8bbf5
FG
277* ``encrypted``
278* ``osd_fsid``
279* ``osd_id``
b32b8144 280* ``crush_device_class``
3efd9988
FG
281
282For :term:`filestore` these tags will be added:
283
284* ``journal_device``
285* ``journal_uuid``
286
287For :term:`bluestore` these tags will be added:
288
289* ``block_device``
290* ``block_uuid``
291* ``db_device``
292* ``db_uuid``
293* ``wal_device``
294* ``wal_uuid``
b5b8bbf5
FG
295
296.. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
297
298
299Summary
300-------
3efd9988
FG
301To recap the ``prepare`` process for :term:`bluestore`:
302
92f5a8d4
TL
303#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
304#. Creates logical volumes on any raw physical devices.
3efd9988
FG
305#. Generate a UUID for the OSD
306#. Ask the monitor get an OSD ID reusing the generated UUID
307#. OSD data directory is created on a tmpfs mount.
308#. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
309#. monmap is fetched for activation
310#. Data directory is populated by ``ceph-osd``
11fdf7f2 311#. Logical Volumes are assigned all the Ceph metadata using lvm tags
3efd9988
FG
312
313
314And the ``prepare`` process for :term:`filestore`:
b5b8bbf5 315
92f5a8d4 316#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
b5b8bbf5
FG
317#. Generate a UUID for the OSD
318#. Ask the monitor get an OSD ID reusing the generated UUID
319#. OSD data directory is created and data volume mounted
320#. Journal is symlinked from data volume to journal location
321#. monmap is fetched for activation
322#. devices is mounted and data directory is populated by ``ceph-osd``
323#. data and journal volumes are assigned all the Ceph metadata using lvm tags