]> git.proxmox.com Git - ceph.git/blob - ceph/doc/ceph-volume/lvm/prepare.rst
import new upstream nautilus stable release 14.2.8
[ceph.git] / ceph / doc / ceph-volume / lvm / prepare.rst
1 .. _ceph-volume-lvm-prepare:
2
3 ``prepare``
4 ===========
5 This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is
6 recommended to pre-provision a logical volume before using it with
7 ``ceph-volume lvm``.
8
9 Logical volumes are not altered except for adding extra metadata.
10
11 .. note:: This is part of a two step process to deploy an OSD. If looking for
12 a single-call way, please see :ref:`ceph-volume-lvm-create`
13
14 To help identify volumes, the process of preparing a volume (or volumes) to
15 work with Ceph, the tool will assign a few pieces of metadata information using
16 :term:`LVM tags`.
17
18 :term:`LVM tags` makes volumes easy to discover later, and help identify them as
19 part of a Ceph system, and what role they have (journal, filestore, bluestore,
20 etc...)
21
22 Although initially :term:`filestore` is supported (and supported by default)
23 the back end can be specified with:
24
25
26 * :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
27 * :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
28
29 .. _ceph-volume-lvm-prepare_bluestore:
30
31 ``bluestore``
32 -------------
33 The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
34 more flexibility for devices compared to :term:`filestore`.
35 Bluestore supports the following configurations:
36
37 * A block device, a block.wal, and a block.db device
38 * A block device and a block.wal device
39 * A block device and a block.db device
40 * A single block device
41
42 The bluestore subcommand accepts physical block devices, partitions on
43 physical block devices or logical volumes as arguments for the various device parameters
44 If a physical device is provided, a logical volume will be created. A volume group will
45 either be created or reused it its name begins with ``ceph``.
46 This allows a simpler approach at using LVM but at the cost of flexibility:
47 there are no options or configurations to change how the LV is created.
48
49 The ``block`` is specified with the ``--data`` flag, and in its simplest use
50 case it looks like::
51
52 ceph-volume lvm prepare --bluestore --data vg/lv
53
54 A raw device can be specified in the same way::
55
56 ceph-volume lvm prepare --bluestore --data /path/to/device
57
58 For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
59
60 ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
61
62 If a ``block.db`` or a ``block.wal`` is needed (they are optional for
63 bluestore) they can be specified with ``--block.db`` and ``--block.wal``
64 accordingly. These can be a physical device, a partition or
65 a logical volume.
66
67 For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
68 because they can be used as-is.
69
70 While creating the OSD directory, the process will use a ``tmpfs`` mount to
71 place all the files needed for the OSD. These files are initially created by
72 ``ceph-osd --mkfs`` and are fully ephemeral.
73
74 A symlink is always created for the ``block`` device, and optionally for
75 ``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
76 id of 0, the directory could look like::
77
78 # ls -l /var/lib/ceph/osd/ceph-0
79 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
80 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
81 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
82 -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
83 -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
84 -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
85 -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
86 -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
87 -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
88
89 In the above case, a device was used for ``block`` so ``ceph-volume`` create
90 a volume group and a logical volume using the following convention:
91
92 * volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
93 ``ceph-{random uuid}``
94
95 * logical volume name: ``osd-block-{osd_fsid}``
96
97
98 .. _ceph-volume-lvm-prepare_filestore:
99
100 ``filestore``
101 -------------
102 This is the OSD backend that allows preparation of logical volumes for
103 a :term:`filestore` objectstore OSD.
104
105 It can use a logical volume for the OSD data and a physical device, a partition
106 or logical volume for the journal. A physical device will have a logical volume
107 created on it. A volume group will either be created or reused it its name begins
108 with ``ceph``. No special preparation is needed for these volumes other than
109 following the minimum size requirements for data and journal.
110
111 The CLI call looks like this of a basic standalone filestore OSD::
112
113 ceph-volume lvm prepare --filestore --data <data block device>
114
115 To deploy file store with an external journal::
116
117 ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
118
119 For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
120
121 ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
122
123 Both the journal and data block device can take three forms:
124
125 * a physical block device
126 * a partition on a physical block device
127 * a logical volume
128
129 When using logical volumes the value *must* be of the format
130 ``volume_group/logical_volume``. Since logical volume names
131 are not enforced for uniqueness, this prevents accidentally
132 choosing the wrong volume.
133
134 When using a partition, it *must* contain a ``PARTUUID``, that can be
135 discovered by ``blkid``. THis ensure it can later be identified correctly
136 regardless of the device name (or path).
137
138 For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
139 the journal::
140
141 ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
142
143 Passing a bare device for data and a logical volume ias the journal::
144
145 ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
146
147 A generated uuid is used to ask the cluster for a new OSD. These two pieces are
148 crucial for identifying an OSD and will later be used throughout the
149 :ref:`ceph-volume-lvm-activate` process.
150
151 The OSD data directory is created using the following convention::
152
153 /var/lib/ceph/osd/<cluster name>-<osd id>
154
155 At this point the data volume is mounted at this location, and the journal
156 volume is linked::
157
158 ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
159
160 The monmap is fetched using the bootstrap key from the OSD::
161
162 /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
163 --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
164 mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
165
166 ``ceph-osd`` will be called to populate the OSD directory, that is already
167 mounted, re-using all the pieces of information from the initial steps::
168
169 ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \
170 --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
171 /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \
172 --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \
173 --setuser ceph --setgroup ceph
174
175
176 .. _ceph-volume-lvm-partitions:
177
178 Partitioning
179 ------------
180 ``ceph-volume lvm`` does not currently create partitions from a whole device.
181 If using device partitions the only requirement is that they contain the
182 ``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
183 ``parted`` will create that automatically for a new partition.
184
185 For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
186 use ``parted`` to create a new partition. First we list the device
187 information::
188
189 $ parted --script /dev/sdd print
190 Model: VBOX HARDDISK (scsi)
191 Disk /dev/sdd: 11.5GB
192 Sector size (logical/physical): 512B/512B
193 Disk Flags:
194
195 This device is not even labeled yet, so we can use ``parted`` to create
196 a ``gpt`` label before we create a partition, and verify again with ``parted
197 print``::
198
199 $ parted --script /dev/sdd mklabel gpt
200 $ parted --script /dev/sdd print
201 Model: VBOX HARDDISK (scsi)
202 Disk /dev/sdd: 11.5GB
203 Sector size (logical/physical): 512B/512B
204 Partition Table: gpt
205 Disk Flags:
206
207 Now lets create a single partition, and verify later if ``blkid`` can find
208 a ``PARTUUID`` that is needed by ``ceph-volume``::
209
210 $ parted --script /dev/sdd mkpart primary 1 100%
211 $ blkid /dev/sdd1
212 /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
213
214
215 .. _ceph-volume-lvm-existing-osds:
216
217 Existing OSDs
218 -------------
219 For existing clusters that want to use this new system and have OSDs that are
220 already running there are a few things to take into account:
221
222 .. warning:: this process will forcefully format the data device, destroying
223 existing data, if any.
224
225 * OSD paths should follow this convention::
226
227 /var/lib/ceph/osd/<cluster name>-<osd id>
228
229 * Preferably, no other mechanisms to mount the volume should exist, and should
230 be removed (like fstab mount points)
231
232 The one time process for an existing OSD, with an ID of 0 and using
233 a ``"ceph"`` cluster name would look like (the following command will **destroy
234 any data** in the OSD)::
235
236 ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
237
238 The command line tool will not contact the monitor to generate an OSD ID and
239 will format the LVM device in addition to storing the metadata on it so that it
240 can be started later (for detailed metadata description see
241 :ref:`ceph-volume-lvm-tags`).
242
243
244 Crush device class
245 ------------------
246
247 To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will
248 work for both bluestore and filestore OSDs::
249
250 ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
251
252
253 .. _ceph-volume-lvm-multipath:
254
255 ``multipath`` support
256 ---------------------
257 Devices that come from ``multipath`` are not supported as-is. The tool will
258 refuse to consume a raw multipath device and will report a message like::
259
260 --> RuntimeError: Cannot use device (/dev/mapper/<name>). A vg/lv path or an existing device is needed
261
262 The reason for not supporting multipath is that depending on the type of the
263 multipath setup, if using an active/passive array as the underlying physical
264 devices, filters are required in ``lvm.conf`` to exclude the disks that are part of
265 those underlying devices.
266
267 It is unfeasible for ceph-volume to understand what type of configuration is
268 needed for LVM to be able to work in various different multipath scenarios. The
269 functionality to create the LV for you is merely a (naive) convenience,
270 anything that involves different settings or configuration must be provided by
271 a config management system which can then provide VGs and LVs for ceph-volume
272 to consume.
273
274 This situation will only arise when trying to use the ceph-volume functionality
275 that creates a volume group and logical volume from a device. If a multipath
276 device is already a logical volume it *should* work, given that the LVM
277 configuration is done correctly to avoid issues.
278
279
280 Storing metadata
281 ----------------
282 The following tags will get applied as part of the preparation process
283 regardless of the type of volume (journal or data) or OSD objectstore:
284
285 * ``cluster_fsid``
286 * ``encrypted``
287 * ``osd_fsid``
288 * ``osd_id``
289 * ``crush_device_class``
290
291 For :term:`filestore` these tags will be added:
292
293 * ``journal_device``
294 * ``journal_uuid``
295
296 For :term:`bluestore` these tags will be added:
297
298 * ``block_device``
299 * ``block_uuid``
300 * ``db_device``
301 * ``db_uuid``
302 * ``wal_device``
303 * ``wal_uuid``
304
305 .. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
306
307
308 Summary
309 -------
310 To recap the ``prepare`` process for :term:`bluestore`:
311
312 #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
313 #. Creates logical volumes on any raw physical devices.
314 #. Generate a UUID for the OSD
315 #. Ask the monitor get an OSD ID reusing the generated UUID
316 #. OSD data directory is created on a tmpfs mount.
317 #. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
318 #. monmap is fetched for activation
319 #. Data directory is populated by ``ceph-osd``
320 #. Logical Volumes are assigned all the Ceph metadata using lvm tags
321
322
323 And the ``prepare`` process for :term:`filestore`:
324
325 #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
326 #. Generate a UUID for the OSD
327 #. Ask the monitor get an OSD ID reusing the generated UUID
328 #. OSD data directory is created and data volume mounted
329 #. Journal is symlinked from data volume to journal location
330 #. monmap is fetched for activation
331 #. devices is mounted and data directory is populated by ``ceph-osd``
332 #. data and journal volumes are assigned all the Ceph metadata using lvm tags