]>
Commit | Line | Data |
---|---|---|
b5b8bbf5 FG |
1 | .. _ceph-volume-lvm-prepare: |
2 | ||
3 | ``prepare`` | |
4 | =========== | |
3efd9988 FG |
5 | This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is |
6 | recommended to pre-provision a logical volume before using it with | |
7 | ``ceph-volume lvm``. | |
8 | ||
9 | Logical volumes are not altered except for adding extra metadata. | |
b5b8bbf5 FG |
10 | |
11 | .. note:: This is part of a two step process to deploy an OSD. If looking for | |
12 | a single-call way, please see :ref:`ceph-volume-lvm-create` | |
13 | ||
14 | To help identify volumes, the process of preparing a volume (or volumes) to | |
15 | work with Ceph, the tool will assign a few pieces of metadata information using | |
16 | :term:`LVM tags`. | |
17 | ||
18 | :term:`LVM tags` makes volumes easy to discover later, and help identify them as | |
19 | part of a Ceph system, and what role they have (journal, filestore, bluestore, | |
20 | etc...) | |
21 | ||
22 | Although initially :term:`filestore` is supported (and supported by default) | |
23 | the back end can be specified with: | |
24 | ||
25 | ||
26 | * :ref:`--filestore <ceph-volume-lvm-prepare_filestore>` | |
3efd9988 | 27 | * :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>` |
b5b8bbf5 | 28 | |
92f5a8d4 TL |
29 | .. _ceph-volume-lvm-prepare_bluestore: |
30 | ||
31 | ``bluestore`` | |
32 | ------------- | |
33 | The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit | |
34 | more flexibility for devices compared to :term:`filestore`. | |
35 | Bluestore supports the following configurations: | |
36 | ||
37 | * A block device, a block.wal, and a block.db device | |
38 | * A block device and a block.wal device | |
39 | * A block device and a block.db device | |
40 | * A single block device | |
41 | ||
42 | The bluestore subcommand accepts physical block devices, partitions on | |
43 | physical block devices or logical volumes as arguments for the various device parameters | |
44 | If a physical device is provided, a logical volume will be created. A volume group will | |
45 | either be created or reused it its name begins with ``ceph``. | |
46 | This allows a simpler approach at using LVM but at the cost of flexibility: | |
47 | there are no options or configurations to change how the LV is created. | |
48 | ||
49 | The ``block`` is specified with the ``--data`` flag, and in its simplest use | |
50 | case it looks like:: | |
51 | ||
52 | ceph-volume lvm prepare --bluestore --data vg/lv | |
53 | ||
54 | A raw device can be specified in the same way:: | |
55 | ||
56 | ceph-volume lvm prepare --bluestore --data /path/to/device | |
57 | ||
58 | For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required:: | |
59 | ||
60 | ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv | |
61 | ||
62 | If a ``block.db`` or a ``block.wal`` is needed (they are optional for | |
63 | bluestore) they can be specified with ``--block.db`` and ``--block.wal`` | |
64 | accordingly. These can be a physical device, a partition or | |
65 | a logical volume. | |
66 | ||
67 | For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes | |
68 | because they can be used as-is. | |
69 | ||
70 | While creating the OSD directory, the process will use a ``tmpfs`` mount to | |
71 | place all the files needed for the OSD. These files are initially created by | |
72 | ``ceph-osd --mkfs`` and are fully ephemeral. | |
73 | ||
74 | A symlink is always created for the ``block`` device, and optionally for | |
75 | ``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD | |
76 | id of 0, the directory could look like:: | |
77 | ||
78 | # ls -l /var/lib/ceph/osd/ceph-0 | |
79 | lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62 | |
80 | lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1 | |
81 | lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0 | |
82 | -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid | |
83 | -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid | |
84 | -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring | |
85 | -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready | |
86 | -rw-------. 1 ceph ceph 10 Oct 20 13:05 type | |
87 | -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami | |
88 | ||
89 | In the above case, a device was used for ``block`` so ``ceph-volume`` create | |
90 | a volume group and a logical volume using the following convention: | |
91 | ||
92 | * volume group name: ``ceph-{cluster fsid}`` or if the vg exists already | |
93 | ``ceph-{random uuid}`` | |
94 | ||
95 | * logical volume name: ``osd-block-{osd_fsid}`` | |
96 | ||
97 | ||
b5b8bbf5 FG |
98 | .. _ceph-volume-lvm-prepare_filestore: |
99 | ||
100 | ``filestore`` | |
101 | ------------- | |
3efd9988 FG |
102 | This is the OSD backend that allows preparation of logical volumes for |
103 | a :term:`filestore` objectstore OSD. | |
b5b8bbf5 | 104 | |
92f5a8d4 TL |
105 | It can use a logical volume for the OSD data and a physical device, a partition |
106 | or logical volume for the journal. A physical device will have a logical volume | |
107 | created on it. A volume group will either be created or reused it its name begins | |
108 | with ``ceph``. No special preparation is needed for these volumes other than | |
109 | following the minimum size requirements for data and journal. | |
110 | ||
111 | The CLI call looks like this of a basic standalone filestore OSD:: | |
112 | ||
113 | ceph-volume lvm prepare --filestore --data <data block device> | |
b5b8bbf5 | 114 | |
92f5a8d4 | 115 | To deploy file store with an external journal:: |
b5b8bbf5 | 116 | |
92f5a8d4 | 117 | ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device> |
b5b8bbf5 | 118 | |
28e407b8 AA |
119 | For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required:: |
120 | ||
92f5a8d4 | 121 | ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device> |
28e407b8 | 122 | |
92f5a8d4 | 123 | Both the journal and data block device can take three forms: |
3efd9988 | 124 | |
92f5a8d4 TL |
125 | * a physical block device |
126 | * a partition on a physical block device | |
127 | * a logical volume | |
3efd9988 | 128 | |
92f5a8d4 TL |
129 | When using logical volumes the value *must* be of the format |
130 | ``volume_group/logical_volume``. Since logical volume names | |
131 | are not enforced for uniqueness, this prevents accidentally | |
132 | choosing the wrong volume. | |
3efd9988 | 133 | |
92f5a8d4 TL |
134 | When using a partition, it *must* contain a ``PARTUUID``, that can be |
135 | discovered by ``blkid``. THis ensure it can later be identified correctly | |
136 | regardless of the device name (or path). | |
137 | ||
138 | For example: passing a logical volume for data and a partition ``/dev/sdc1`` for | |
139 | the journal:: | |
3efd9988 | 140 | |
11fdf7f2 | 141 | ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1 |
3efd9988 | 142 | |
92f5a8d4 | 143 | Passing a bare device for data and a logical volume ias the journal:: |
3efd9988 | 144 | |
92f5a8d4 | 145 | ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv |
b5b8bbf5 FG |
146 | |
147 | A generated uuid is used to ask the cluster for a new OSD. These two pieces are | |
148 | crucial for identifying an OSD and will later be used throughout the | |
149 | :ref:`ceph-volume-lvm-activate` process. | |
150 | ||
151 | The OSD data directory is created using the following convention:: | |
152 | ||
153 | /var/lib/ceph/osd/<cluster name>-<osd id> | |
154 | ||
155 | At this point the data volume is mounted at this location, and the journal | |
156 | volume is linked:: | |
157 | ||
158 | ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal | |
159 | ||
160 | The monmap is fetched using the bootstrap key from the OSD:: | |
161 | ||
162 | /usr/bin/ceph --cluster ceph --name client.bootstrap-osd | |
163 | --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring | |
164 | mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap | |
165 | ||
166 | ``ceph-osd`` will be called to populate the OSD directory, that is already | |
167 | mounted, re-using all the pieces of information from the initial steps:: | |
168 | ||
169 | ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \ | |
170 | --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \ | |
171 | /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \ | |
172 | --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \ | |
173 | --setuser ceph --setgroup ceph | |
174 | ||
11fdf7f2 TL |
175 | |
176 | .. _ceph-volume-lvm-partitions: | |
177 | ||
178 | Partitioning | |
179 | ------------ | |
180 | ``ceph-volume lvm`` does not currently create partitions from a whole device. | |
181 | If using device partitions the only requirement is that they contain the | |
182 | ``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and | |
183 | ``parted`` will create that automatically for a new partition. | |
184 | ||
185 | For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can | |
186 | use ``parted`` to create a new partition. First we list the device | |
187 | information:: | |
188 | ||
189 | $ parted --script /dev/sdd print | |
190 | Model: VBOX HARDDISK (scsi) | |
191 | Disk /dev/sdd: 11.5GB | |
192 | Sector size (logical/physical): 512B/512B | |
193 | Disk Flags: | |
194 | ||
195 | This device is not even labeled yet, so we can use ``parted`` to create | |
196 | a ``gpt`` label before we create a partition, and verify again with ``parted | |
197 | print``:: | |
198 | ||
199 | $ parted --script /dev/sdd mklabel gpt | |
200 | $ parted --script /dev/sdd print | |
201 | Model: VBOX HARDDISK (scsi) | |
202 | Disk /dev/sdd: 11.5GB | |
203 | Sector size (logical/physical): 512B/512B | |
204 | Partition Table: gpt | |
205 | Disk Flags: | |
206 | ||
207 | Now lets create a single partition, and verify later if ``blkid`` can find | |
208 | a ``PARTUUID`` that is needed by ``ceph-volume``:: | |
209 | ||
210 | $ parted --script /dev/sdd mkpart primary 1 100% | |
211 | $ blkid /dev/sdd1 | |
212 | /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4" | |
213 | ||
214 | ||
b5b8bbf5 FG |
215 | .. _ceph-volume-lvm-existing-osds: |
216 | ||
217 | Existing OSDs | |
218 | ------------- | |
219 | For existing clusters that want to use this new system and have OSDs that are | |
220 | already running there are a few things to take into account: | |
221 | ||
222 | .. warning:: this process will forcefully format the data device, destroying | |
223 | existing data, if any. | |
224 | ||
225 | * OSD paths should follow this convention:: | |
226 | ||
227 | /var/lib/ceph/osd/<cluster name>-<osd id> | |
228 | ||
229 | * Preferably, no other mechanisms to mount the volume should exist, and should | |
230 | be removed (like fstab mount points) | |
b5b8bbf5 | 231 | |
11fdf7f2 TL |
232 | The one time process for an existing OSD, with an ID of 0 and using |
233 | a ``"ceph"`` cluster name would look like (the following command will **destroy | |
234 | any data** in the OSD):: | |
b5b8bbf5 FG |
235 | |
236 | ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB | |
237 | ||
238 | The command line tool will not contact the monitor to generate an OSD ID and | |
239 | will format the LVM device in addition to storing the metadata on it so that it | |
11fdf7f2 TL |
240 | can be started later (for detailed metadata description see |
241 | :ref:`ceph-volume-lvm-tags`). | |
b5b8bbf5 FG |
242 | |
243 | ||
b32b8144 FG |
244 | Crush device class |
245 | ------------------ | |
246 | ||
247 | To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will | |
248 | work for both bluestore and filestore OSDs:: | |
249 | ||
250 | ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo | |
251 | ||
252 | ||
91327a77 AA |
253 | .. _ceph-volume-lvm-multipath: |
254 | ||
255 | ``multipath`` support | |
256 | --------------------- | |
257 | Devices that come from ``multipath`` are not supported as-is. The tool will | |
258 | refuse to consume a raw multipath device and will report a message like:: | |
259 | ||
260 | --> RuntimeError: Cannot use device (/dev/mapper/<name>). A vg/lv path or an existing device is needed | |
261 | ||
262 | The reason for not supporting multipath is that depending on the type of the | |
263 | multipath setup, if using an active/passive array as the underlying physical | |
264 | devices, filters are required in ``lvm.conf`` to exclude the disks that are part of | |
265 | those underlying devices. | |
266 | ||
267 | It is unfeasible for ceph-volume to understand what type of configuration is | |
268 | needed for LVM to be able to work in various different multipath scenarios. The | |
269 | functionality to create the LV for you is merely a (naive) convenience, | |
270 | anything that involves different settings or configuration must be provided by | |
271 | a config management system which can then provide VGs and LVs for ceph-volume | |
272 | to consume. | |
273 | ||
274 | This situation will only arise when trying to use the ceph-volume functionality | |
275 | that creates a volume group and logical volume from a device. If a multipath | |
276 | device is already a logical volume it *should* work, given that the LVM | |
277 | configuration is done correctly to avoid issues. | |
278 | ||
279 | ||
b5b8bbf5 FG |
280 | Storing metadata |
281 | ---------------- | |
3efd9988 FG |
282 | The following tags will get applied as part of the preparation process |
283 | regardless of the type of volume (journal or data) or OSD objectstore: | |
b5b8bbf5 FG |
284 | |
285 | * ``cluster_fsid`` | |
b5b8bbf5 FG |
286 | * ``encrypted`` |
287 | * ``osd_fsid`` | |
288 | * ``osd_id`` | |
b32b8144 | 289 | * ``crush_device_class`` |
3efd9988 FG |
290 | |
291 | For :term:`filestore` these tags will be added: | |
292 | ||
293 | * ``journal_device`` | |
294 | * ``journal_uuid`` | |
295 | ||
296 | For :term:`bluestore` these tags will be added: | |
297 | ||
298 | * ``block_device`` | |
299 | * ``block_uuid`` | |
300 | * ``db_device`` | |
301 | * ``db_uuid`` | |
302 | * ``wal_device`` | |
303 | * ``wal_uuid`` | |
b5b8bbf5 FG |
304 | |
305 | .. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api` | |
306 | ||
307 | ||
308 | Summary | |
309 | ------- | |
3efd9988 FG |
310 | To recap the ``prepare`` process for :term:`bluestore`: |
311 | ||
92f5a8d4 TL |
312 | #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments. |
313 | #. Creates logical volumes on any raw physical devices. | |
3efd9988 FG |
314 | #. Generate a UUID for the OSD |
315 | #. Ask the monitor get an OSD ID reusing the generated UUID | |
316 | #. OSD data directory is created on a tmpfs mount. | |
317 | #. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined. | |
318 | #. monmap is fetched for activation | |
319 | #. Data directory is populated by ``ceph-osd`` | |
11fdf7f2 | 320 | #. Logical Volumes are assigned all the Ceph metadata using lvm tags |
3efd9988 FG |
321 | |
322 | ||
323 | And the ``prepare`` process for :term:`filestore`: | |
b5b8bbf5 | 324 | |
92f5a8d4 | 325 | #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments. |
b5b8bbf5 FG |
326 | #. Generate a UUID for the OSD |
327 | #. Ask the monitor get an OSD ID reusing the generated UUID | |
328 | #. OSD data directory is created and data volume mounted | |
329 | #. Journal is symlinked from data volume to journal location | |
330 | #. monmap is fetched for activation | |
331 | #. devices is mounted and data directory is populated by ``ceph-osd`` | |
332 | #. data and journal volumes are assigned all the Ceph metadata using lvm tags |