]> git.proxmox.com Git - ceph.git/blob - ceph/doc/ceph-volume/lvm/prepare.rst
import ceph quincy 17.2.6
[ceph.git] / ceph / doc / ceph-volume / lvm / prepare.rst
1 .. _ceph-volume-lvm-prepare:
2
3 ``prepare``
4 ===========
5 Before you run ``ceph-volume lvm prepare``, we recommend that you provision a
6 logical volume. Then you can run ``prepare`` on that logical volume.
7
8 ``prepare`` adds metadata to logical volumes but does not alter them in any
9 other way.
10
11 .. note:: This is part of a two-step process to deploy an OSD. If you prefer
12 to deploy an OSD by using only one command, see :ref:`ceph-volume-lvm-create`.
13
14 ``prepare`` uses :term:`LVM tags` to assign several pieces of metadata to a
15 logical volume. Volumes tagged in this way are easier to identify and easier to
16 use with Ceph. :term:`LVM tags` identify logical volumes by the role that they
17 play in the Ceph cluster (for example: BlueStore data or BlueStore WAL+DB).
18
19 :term:`BlueStore<bluestore>` is the default backend. Ceph permits changing
20 the backend, which can be done by using the following flags and arguments:
21
22 * :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
23 * :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
24
25 .. _ceph-volume-lvm-prepare_bluestore:
26
27 ``bluestore``
28 -------------
29 :term:`Bluestore<bluestore>` is the default backend for new OSDs. It
30 offers more flexibility for devices than :term:`filestore` does. Bluestore
31 supports the following configurations:
32
33 * a block device, a block.wal device, and a block.db device
34 * a block device and a block.wal device
35 * a block device and a block.db device
36 * a single block device
37
38 The ``bluestore`` subcommand accepts physical block devices, partitions on physical
39 block devices, or logical volumes as arguments for the various device
40 parameters. If a physical block device is provided, a logical volume will be
41 created. If the provided volume group's name begins with `ceph`, it will be
42 created if it does not yet exist and it will be clobbered and reused if it
43 already exists. This allows for a simpler approach to using LVM but at the
44 cost of flexibility: no option or configuration can be used to change how the
45 logical volume is created.
46
47 The ``block`` is specified with the ``--data`` flag, and in its simplest use
48 case it looks like:
49
50 .. prompt:: bash #
51
52 ceph-volume lvm prepare --bluestore --data vg/lv
53
54 A raw device can be specified in the same way:
55
56 .. prompt:: bash #
57
58 ceph-volume lvm prepare --bluestore --data /path/to/device
59
60 For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required:
61
62 .. prompt:: bash #
63
64 ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
65
66 If a ``block.db`` device or a ``block.wal`` device is needed, it can be
67 specified with ``--block.db`` or ``--block.wal``. These can be physical
68 devices, partitions, or logical volumes. ``block.db`` and ``block.wal`` are
69 optional for bluestore.
70
71 For both ``block.db`` and ``block.wal``, partitions can be used as-is, and
72 therefore are not made into logical volumes.
73
74 While creating the OSD directory, the process uses a ``tmpfs`` mount to hold
75 the files needed for the OSD. These files are created by ``ceph-osd --mkfs``
76 and are ephemeral.
77
78 A symlink is created for the ``block`` device, and is optional for ``block.db``
79 and ``block.wal``. For a cluster with a default name and an OSD ID of 0, the
80 directory looks like this::
81
82 # ls -l /var/lib/ceph/osd/ceph-0
83 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
84 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
85 lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
86 -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
87 -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
88 -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
89 -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
90 -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
91 -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
92
93 In the above case, a device was used for ``block``, so ``ceph-volume`` created
94 a volume group and a logical volume using the following conventions:
95
96 * volume group name: ``ceph-{cluster fsid}`` (or if the volume group already
97 exists: ``ceph-{random uuid}``)
98
99 * logical volume name: ``osd-block-{osd_fsid}``
100
101
102 .. _ceph-volume-lvm-prepare_filestore:
103
104 ``filestore``
105 -------------
106 ``Filestore<filestore>`` is the OSD backend that prepares logical volumes for a
107 :term:`filestore`-backed object-store OSD.
108
109
110 ``Filestore<filestore>`` uses a logical volume to store OSD data and it uses
111 physical devices, partitions, or logical volumes to store the journal. If a
112 physical device is used to create a filestore backend, a logical volume will be
113 created on that physical device. If the provided volume group's name begins
114 with `ceph`, it will be created if it does not yet exist and it will be
115 clobbered and reused if it already exists. No special preparation is needed for
116 these volumes, but be sure to meet the minimum size requirements for OSD data and
117 for the journal.
118
119 Use the following command to create a basic filestore OSD:
120
121 .. prompt:: bash #
122
123 ceph-volume lvm prepare --filestore --data <data block device>
124
125 Use this command to deploy filestore with an external journal:
126
127 .. prompt:: bash #
128
129 ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
130
131 Use this command to enable :ref:`encryption <ceph-volume-lvm-encryption>`, and note that the ``--dmcrypt`` flag is required:
132
133 .. prompt:: bash #
134
135 ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
136
137 The data block device and the journal can each take one of three forms:
138
139 * a physical block device
140 * a partition on a physical block device
141 * a logical volume
142
143 If you use a logical volume to deploy filestore, the value that you pass in the
144 command *must* be of the format ``volume_group/logical_volume_name``. Since logical
145 volume names are not enforced for uniqueness, using this format is an important
146 safeguard against accidentally choosing the wrong volume (and clobbering its data).
147
148 If you use a partition to deploy filestore, the partition *must* contain a
149 ``PARTUUID`` that can be discovered by ``blkid``. This ensures that the
150 partition can be identified correctly regardless of the device's name (or path).
151
152 For example, to use a logical volume for OSD data and a partition
153 (``/dev/sdc1``) for the journal, run a command of this form:
154
155 .. prompt:: bash #
156
157 ceph-volume lvm prepare --filestore --data volume_group/logical_volume_name --journal /dev/sdc1
158
159 Or, to use a bare device for data and a logical volume for the journal:
160
161 .. prompt:: bash #
162
163 ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
164
165 A generated UUID is used when asking the cluster for a new OSD. These two
166 pieces of information (the OSD ID and the OSD UUID) are necessary for
167 identifying a given OSD and will later be used throughout the
168 :ref:`activation<ceph-volume-lvm-activate>` process.
169
170 The OSD data directory is created using the following convention::
171
172 /var/lib/ceph/osd/<cluster name>-<osd id>
173
174 To link the journal volume to the mounted data volume, use this command:
175
176 .. prompt:: bash #
177
178 ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
179
180 To fetch the monmap by using the bootstrap key from the OSD, use this command:
181
182 .. prompt:: bash #
183
184 /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
185 /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
186 /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
187
188 To populate the OSD directory (which has already been mounted), use this ``ceph-osd`` command:
189 .. prompt:: bash #
190
191 ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \ --monmap
192 /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
193 /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal
194 /var/lib/ceph/osd/<cluster name>-<osd id>/journal \ --osd-uuid <osd uuid>
195 --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \ --setuser ceph
196 --setgroup ceph
197
198 All of the information from the previous steps is used in the above command.
199
200
201
202 .. _ceph-volume-lvm-partitions:
203
204 Partitioning
205 ------------
206 ``ceph-volume lvm`` does not currently create partitions from a whole device.
207 If using device partitions the only requirement is that they contain the
208 ``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and
209 ``parted`` will create that automatically for a new partition.
210
211 For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can
212 use ``parted`` to create a new partition. First we list the device
213 information::
214
215 $ parted --script /dev/sdd print
216 Model: VBOX HARDDISK (scsi)
217 Disk /dev/sdd: 11.5GB
218 Sector size (logical/physical): 512B/512B
219 Disk Flags:
220
221 This device is not even labeled yet, so we can use ``parted`` to create
222 a ``gpt`` label before we create a partition, and verify again with ``parted
223 print``::
224
225 $ parted --script /dev/sdd mklabel gpt
226 $ parted --script /dev/sdd print
227 Model: VBOX HARDDISK (scsi)
228 Disk /dev/sdd: 11.5GB
229 Sector size (logical/physical): 512B/512B
230 Partition Table: gpt
231 Disk Flags:
232
233 Now lets create a single partition, and verify later if ``blkid`` can find
234 a ``PARTUUID`` that is needed by ``ceph-volume``::
235
236 $ parted --script /dev/sdd mkpart primary 1 100%
237 $ blkid /dev/sdd1
238 /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4"
239
240
241 .. _ceph-volume-lvm-existing-osds:
242
243 Existing OSDs
244 -------------
245 For existing clusters that want to use this new system and have OSDs that are
246 already running there are a few things to take into account:
247
248 .. warning:: this process will forcefully format the data device, destroying
249 existing data, if any.
250
251 * OSD paths should follow this convention::
252
253 /var/lib/ceph/osd/<cluster name>-<osd id>
254
255 * Preferably, no other mechanisms to mount the volume should exist, and should
256 be removed (like fstab mount points)
257
258 The one time process for an existing OSD, with an ID of 0 and using
259 a ``"ceph"`` cluster name would look like (the following command will **destroy
260 any data** in the OSD)::
261
262 ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
263
264 The command line tool will not contact the monitor to generate an OSD ID and
265 will format the LVM device in addition to storing the metadata on it so that it
266 can be started later (for detailed metadata description see
267 :ref:`ceph-volume-lvm-tags`).
268
269
270 Crush device class
271 ------------------
272
273 To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will
274 work for both bluestore and filestore OSDs::
275
276 ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo
277
278
279 .. _ceph-volume-lvm-multipath:
280
281 ``multipath`` support
282 ---------------------
283 ``multipath`` devices are supported if ``lvm`` is configured properly.
284
285 **Leave it to LVM**
286
287 Most Linux distributions should ship their LVM2 package with
288 ``multipath_component_detection = 1`` in the default configuration. With this
289 setting ``LVM`` ignores any device that is a multipath component and
290 ``ceph-volume`` will accordingly not touch these devices.
291
292 **Using filters**
293
294 Should this setting be unavailable, a correct ``filter`` expression must be
295 provided in ``lvm.conf``. ``ceph-volume`` must not be able to use both the
296 multipath device and its multipath components.
297
298 Storing metadata
299 ----------------
300 The following tags will get applied as part of the preparation process
301 regardless of the type of volume (journal or data) or OSD objectstore:
302
303 * ``cluster_fsid``
304 * ``encrypted``
305 * ``osd_fsid``
306 * ``osd_id``
307 * ``crush_device_class``
308
309 For :term:`filestore` these tags will be added:
310
311 * ``journal_device``
312 * ``journal_uuid``
313
314 For :term:`bluestore` these tags will be added:
315
316 * ``block_device``
317 * ``block_uuid``
318 * ``db_device``
319 * ``db_uuid``
320 * ``wal_device``
321 * ``wal_uuid``
322
323 .. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
324
325
326 Summary
327 -------
328 To recap the ``prepare`` process for :term:`bluestore`:
329
330 #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
331 #. Creates logical volumes on any raw physical devices.
332 #. Generate a UUID for the OSD
333 #. Ask the monitor get an OSD ID reusing the generated UUID
334 #. OSD data directory is created on a tmpfs mount.
335 #. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
336 #. monmap is fetched for activation
337 #. Data directory is populated by ``ceph-osd``
338 #. Logical Volumes are assigned all the Ceph metadata using lvm tags
339
340
341 And the ``prepare`` process for :term:`filestore`:
342
343 #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
344 #. Generate a UUID for the OSD
345 #. Ask the monitor get an OSD ID reusing the generated UUID
346 #. OSD data directory is created and data volume mounted
347 #. Journal is symlinked from data volume to journal location
348 #. monmap is fetched for activation
349 #. devices is mounted and data directory is populated by ``ceph-osd``
350 #. data and journal volumes are assigned all the Ceph metadata using lvm tags