]>
Commit | Line | Data |
---|---|---|
b5b8bbf5 FG |
1 | .. _ceph-volume-lvm-prepare: |
2 | ||
3 | ``prepare`` | |
4 | =========== | |
3efd9988 FG |
5 | This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is |
6 | recommended to pre-provision a logical volume before using it with | |
7 | ``ceph-volume lvm``. | |
8 | ||
9 | Logical volumes are not altered except for adding extra metadata. | |
b5b8bbf5 FG |
10 | |
11 | .. note:: This is part of a two step process to deploy an OSD. If looking for | |
12 | a single-call way, please see :ref:`ceph-volume-lvm-create` | |
13 | ||
14 | To help identify volumes, the process of preparing a volume (or volumes) to | |
15 | work with Ceph, the tool will assign a few pieces of metadata information using | |
16 | :term:`LVM tags`. | |
17 | ||
18 | :term:`LVM tags` makes volumes easy to discover later, and help identify them as | |
19 | part of a Ceph system, and what role they have (journal, filestore, bluestore, | |
20 | etc...) | |
21 | ||
9f95a23c | 22 | Although :term:`bluestore` is the default, the back end can be specified with: |
b5b8bbf5 FG |
23 | |
24 | ||
25 | * :ref:`--filestore <ceph-volume-lvm-prepare_filestore>` | |
3efd9988 | 26 | * :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>` |
b5b8bbf5 | 27 | |
92f5a8d4 TL |
28 | .. _ceph-volume-lvm-prepare_bluestore: |
29 | ||
30 | ``bluestore`` | |
31 | ------------- | |
32 | The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit | |
33 | more flexibility for devices compared to :term:`filestore`. | |
34 | Bluestore supports the following configurations: | |
35 | ||
36 | * A block device, a block.wal, and a block.db device | |
37 | * A block device and a block.wal device | |
38 | * A block device and a block.db device | |
39 | * A single block device | |
40 | ||
41 | The bluestore subcommand accepts physical block devices, partitions on | |
42 | physical block devices or logical volumes as arguments for the various device parameters | |
43 | If a physical device is provided, a logical volume will be created. A volume group will | |
44 | either be created or reused it its name begins with ``ceph``. | |
45 | This allows a simpler approach at using LVM but at the cost of flexibility: | |
46 | there are no options or configurations to change how the LV is created. | |
47 | ||
48 | The ``block`` is specified with the ``--data`` flag, and in its simplest use | |
49 | case it looks like:: | |
50 | ||
51 | ceph-volume lvm prepare --bluestore --data vg/lv | |
52 | ||
53 | A raw device can be specified in the same way:: | |
54 | ||
55 | ceph-volume lvm prepare --bluestore --data /path/to/device | |
56 | ||
57 | For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required:: | |
58 | ||
59 | ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv | |
60 | ||
61 | If a ``block.db`` or a ``block.wal`` is needed (they are optional for | |
62 | bluestore) they can be specified with ``--block.db`` and ``--block.wal`` | |
63 | accordingly. These can be a physical device, a partition or | |
64 | a logical volume. | |
65 | ||
66 | For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes | |
67 | because they can be used as-is. | |
68 | ||
69 | While creating the OSD directory, the process will use a ``tmpfs`` mount to | |
70 | place all the files needed for the OSD. These files are initially created by | |
71 | ``ceph-osd --mkfs`` and are fully ephemeral. | |
72 | ||
73 | A symlink is always created for the ``block`` device, and optionally for | |
74 | ``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD | |
75 | id of 0, the directory could look like:: | |
76 | ||
77 | # ls -l /var/lib/ceph/osd/ceph-0 | |
78 | lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62 | |
79 | lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1 | |
80 | lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0 | |
81 | -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid | |
82 | -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid | |
83 | -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring | |
84 | -rw-------. 1 ceph ceph 6 Oct 20 13:05 ready | |
85 | -rw-------. 1 ceph ceph 10 Oct 20 13:05 type | |
86 | -rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami | |
87 | ||
88 | In the above case, a device was used for ``block`` so ``ceph-volume`` create | |
89 | a volume group and a logical volume using the following convention: | |
90 | ||
91 | * volume group name: ``ceph-{cluster fsid}`` or if the vg exists already | |
92 | ``ceph-{random uuid}`` | |
93 | ||
94 | * logical volume name: ``osd-block-{osd_fsid}`` | |
95 | ||
96 | ||
b5b8bbf5 FG |
97 | .. _ceph-volume-lvm-prepare_filestore: |
98 | ||
99 | ``filestore`` | |
100 | ------------- | |
3efd9988 FG |
101 | This is the OSD backend that allows preparation of logical volumes for |
102 | a :term:`filestore` objectstore OSD. | |
b5b8bbf5 | 103 | |
92f5a8d4 TL |
104 | It can use a logical volume for the OSD data and a physical device, a partition |
105 | or logical volume for the journal. A physical device will have a logical volume | |
106 | created on it. A volume group will either be created or reused it its name begins | |
107 | with ``ceph``. No special preparation is needed for these volumes other than | |
108 | following the minimum size requirements for data and journal. | |
109 | ||
110 | The CLI call looks like this of a basic standalone filestore OSD:: | |
111 | ||
112 | ceph-volume lvm prepare --filestore --data <data block device> | |
b5b8bbf5 | 113 | |
92f5a8d4 | 114 | To deploy file store with an external journal:: |
b5b8bbf5 | 115 | |
92f5a8d4 | 116 | ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device> |
b5b8bbf5 | 117 | |
28e407b8 AA |
118 | For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required:: |
119 | ||
92f5a8d4 | 120 | ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device> |
28e407b8 | 121 | |
92f5a8d4 | 122 | Both the journal and data block device can take three forms: |
3efd9988 | 123 | |
92f5a8d4 TL |
124 | * a physical block device |
125 | * a partition on a physical block device | |
126 | * a logical volume | |
3efd9988 | 127 | |
92f5a8d4 TL |
128 | When using logical volumes the value *must* be of the format |
129 | ``volume_group/logical_volume``. Since logical volume names | |
130 | are not enforced for uniqueness, this prevents accidentally | |
131 | choosing the wrong volume. | |
3efd9988 | 132 | |
92f5a8d4 TL |
133 | When using a partition, it *must* contain a ``PARTUUID``, that can be |
134 | discovered by ``blkid``. THis ensure it can later be identified correctly | |
135 | regardless of the device name (or path). | |
136 | ||
137 | For example: passing a logical volume for data and a partition ``/dev/sdc1`` for | |
138 | the journal:: | |
3efd9988 | 139 | |
11fdf7f2 | 140 | ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1 |
3efd9988 | 141 | |
92f5a8d4 | 142 | Passing a bare device for data and a logical volume ias the journal:: |
3efd9988 | 143 | |
92f5a8d4 | 144 | ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv |
b5b8bbf5 FG |
145 | |
146 | A generated uuid is used to ask the cluster for a new OSD. These two pieces are | |
147 | crucial for identifying an OSD and will later be used throughout the | |
148 | :ref:`ceph-volume-lvm-activate` process. | |
149 | ||
150 | The OSD data directory is created using the following convention:: | |
151 | ||
152 | /var/lib/ceph/osd/<cluster name>-<osd id> | |
153 | ||
154 | At this point the data volume is mounted at this location, and the journal | |
155 | volume is linked:: | |
156 | ||
157 | ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal | |
158 | ||
159 | The monmap is fetched using the bootstrap key from the OSD:: | |
160 | ||
161 | /usr/bin/ceph --cluster ceph --name client.bootstrap-osd | |
162 | --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring | |
163 | mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap | |
164 | ||
165 | ``ceph-osd`` will be called to populate the OSD directory, that is already | |
166 | mounted, re-using all the pieces of information from the initial steps:: | |
167 | ||
168 | ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \ | |
169 | --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \ | |
170 | /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \ | |
171 | --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \ | |
172 | --setuser ceph --setgroup ceph | |
173 | ||
11fdf7f2 TL |
174 | |
175 | .. _ceph-volume-lvm-partitions: | |
176 | ||
177 | Partitioning | |
178 | ------------ | |
179 | ``ceph-volume lvm`` does not currently create partitions from a whole device. | |
180 | If using device partitions the only requirement is that they contain the | |
181 | ``PARTUUID`` and that it is discoverable by ``blkid``. Both ``fdisk`` and | |
182 | ``parted`` will create that automatically for a new partition. | |
183 | ||
184 | For example, using a new, unformatted drive (``/dev/sdd`` in this case) we can | |
185 | use ``parted`` to create a new partition. First we list the device | |
186 | information:: | |
187 | ||
188 | $ parted --script /dev/sdd print | |
189 | Model: VBOX HARDDISK (scsi) | |
190 | Disk /dev/sdd: 11.5GB | |
191 | Sector size (logical/physical): 512B/512B | |
192 | Disk Flags: | |
193 | ||
194 | This device is not even labeled yet, so we can use ``parted`` to create | |
195 | a ``gpt`` label before we create a partition, and verify again with ``parted | |
196 | print``:: | |
197 | ||
198 | $ parted --script /dev/sdd mklabel gpt | |
199 | $ parted --script /dev/sdd print | |
200 | Model: VBOX HARDDISK (scsi) | |
201 | Disk /dev/sdd: 11.5GB | |
202 | Sector size (logical/physical): 512B/512B | |
203 | Partition Table: gpt | |
204 | Disk Flags: | |
205 | ||
206 | Now lets create a single partition, and verify later if ``blkid`` can find | |
207 | a ``PARTUUID`` that is needed by ``ceph-volume``:: | |
208 | ||
209 | $ parted --script /dev/sdd mkpart primary 1 100% | |
210 | $ blkid /dev/sdd1 | |
211 | /dev/sdd1: PARTLABEL="primary" PARTUUID="16399d72-1e1f-467d-96ee-6fe371a7d0d4" | |
212 | ||
213 | ||
b5b8bbf5 FG |
214 | .. _ceph-volume-lvm-existing-osds: |
215 | ||
216 | Existing OSDs | |
217 | ------------- | |
218 | For existing clusters that want to use this new system and have OSDs that are | |
219 | already running there are a few things to take into account: | |
220 | ||
221 | .. warning:: this process will forcefully format the data device, destroying | |
222 | existing data, if any. | |
223 | ||
224 | * OSD paths should follow this convention:: | |
225 | ||
226 | /var/lib/ceph/osd/<cluster name>-<osd id> | |
227 | ||
228 | * Preferably, no other mechanisms to mount the volume should exist, and should | |
229 | be removed (like fstab mount points) | |
b5b8bbf5 | 230 | |
11fdf7f2 TL |
231 | The one time process for an existing OSD, with an ID of 0 and using |
232 | a ``"ceph"`` cluster name would look like (the following command will **destroy | |
233 | any data** in the OSD):: | |
b5b8bbf5 FG |
234 | |
235 | ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB | |
236 | ||
237 | The command line tool will not contact the monitor to generate an OSD ID and | |
238 | will format the LVM device in addition to storing the metadata on it so that it | |
11fdf7f2 TL |
239 | can be started later (for detailed metadata description see |
240 | :ref:`ceph-volume-lvm-tags`). | |
b5b8bbf5 FG |
241 | |
242 | ||
b32b8144 FG |
243 | Crush device class |
244 | ------------------ | |
245 | ||
246 | To set the crush device class for the OSD, use the ``--crush-device-class`` flag. This will | |
247 | work for both bluestore and filestore OSDs:: | |
248 | ||
249 | ceph-volume lvm prepare --bluestore --data vg/lv --crush-device-class foo | |
250 | ||
251 | ||
91327a77 AA |
252 | .. _ceph-volume-lvm-multipath: |
253 | ||
254 | ``multipath`` support | |
255 | --------------------- | |
f91f0fd5 | 256 | ``multipath`` devices are support if ``lvm`` is configured properly. |
91327a77 | 257 | |
f91f0fd5 | 258 | **Leave it to LVM** |
91327a77 | 259 | |
f91f0fd5 TL |
260 | Most Linux distributions should ship their LVM2 package with |
261 | ``multipath_component_detection = 1`` in the default configuration. With this | |
262 | setting ``LVM`` ignores any device that is a multipath component and | |
263 | ``ceph-volume`` will accordingly not touch these devices. | |
91327a77 | 264 | |
f91f0fd5 | 265 | **Using filters** |
91327a77 | 266 | |
f91f0fd5 TL |
267 | Should this setting be unavailable, a correct ``filter`` expression must be |
268 | provided in ``lvm.conf``. ``ceph-volume`` must not be able to use both the | |
269 | multipath device and its multipath components. | |
91327a77 | 270 | |
b5b8bbf5 FG |
271 | Storing metadata |
272 | ---------------- | |
3efd9988 FG |
273 | The following tags will get applied as part of the preparation process |
274 | regardless of the type of volume (journal or data) or OSD objectstore: | |
b5b8bbf5 FG |
275 | |
276 | * ``cluster_fsid`` | |
b5b8bbf5 FG |
277 | * ``encrypted`` |
278 | * ``osd_fsid`` | |
279 | * ``osd_id`` | |
b32b8144 | 280 | * ``crush_device_class`` |
3efd9988 FG |
281 | |
282 | For :term:`filestore` these tags will be added: | |
283 | ||
284 | * ``journal_device`` | |
285 | * ``journal_uuid`` | |
286 | ||
287 | For :term:`bluestore` these tags will be added: | |
288 | ||
289 | * ``block_device`` | |
290 | * ``block_uuid`` | |
291 | * ``db_device`` | |
292 | * ``db_uuid`` | |
293 | * ``wal_device`` | |
294 | * ``wal_uuid`` | |
b5b8bbf5 FG |
295 | |
296 | .. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api` | |
297 | ||
298 | ||
299 | Summary | |
300 | ------- | |
3efd9988 FG |
301 | To recap the ``prepare`` process for :term:`bluestore`: |
302 | ||
92f5a8d4 TL |
303 | #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments. |
304 | #. Creates logical volumes on any raw physical devices. | |
3efd9988 FG |
305 | #. Generate a UUID for the OSD |
306 | #. Ask the monitor get an OSD ID reusing the generated UUID | |
307 | #. OSD data directory is created on a tmpfs mount. | |
308 | #. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined. | |
309 | #. monmap is fetched for activation | |
310 | #. Data directory is populated by ``ceph-osd`` | |
11fdf7f2 | 311 | #. Logical Volumes are assigned all the Ceph metadata using lvm tags |
3efd9988 FG |
312 | |
313 | ||
314 | And the ``prepare`` process for :term:`filestore`: | |
b5b8bbf5 | 315 | |
92f5a8d4 | 316 | #. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments. |
b5b8bbf5 FG |
317 | #. Generate a UUID for the OSD |
318 | #. Ask the monitor get an OSD ID reusing the generated UUID | |
319 | #. OSD data directory is created and data volume mounted | |
320 | #. Journal is symlinked from data volume to journal location | |
321 | #. monmap is fetched for activation | |
322 | #. devices is mounted and data directory is populated by ``ceph-osd`` | |
323 | #. data and journal volumes are assigned all the Ceph metadata using lvm tags |