]> git.proxmox.com Git - ceph.git/blob - ceph/doc/rbd/rbd-openstack.rst
update sources to v12.1.2
[ceph.git] / ceph / doc / rbd / rbd-openstack.rst
1 =============================
2 Block Devices and OpenStack
3 =============================
4
5 .. index:: Ceph Block Device; OpenStack
6
7 You may use Ceph Block Device images with OpenStack through ``libvirt``, which
8 configures the QEMU interface to ``librbd``. Ceph stripes block device images as
9 objects across the cluster, which means that large Ceph Block Device images have
10 better performance than a standalone server!
11
12 To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``,
13 and OpenStack first. We recommend using a separate physical node for your
14 OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a
15 quad-core processor. The following diagram depicts the OpenStack/Ceph
16 technology stack.
17
18
19 .. ditaa:: +---------------------------------------------------+
20 | OpenStack |
21 +---------------------------------------------------+
22 | libvirt |
23 +------------------------+--------------------------+
24 |
25 | configures
26 v
27 +---------------------------------------------------+
28 | QEMU |
29 +---------------------------------------------------+
30 | librbd |
31 +---------------------------------------------------+
32 | librados |
33 +------------------------+-+------------------------+
34 | OSDs | | Monitors |
35 +------------------------+ +------------------------+
36
37 .. important:: To use Ceph Block Devices with OpenStack, you must have
38 access to a running Ceph Storage Cluster.
39
40 Three parts of OpenStack integrate with Ceph's block devices:
41
42 - **Images**: OpenStack Glance manages images for VMs. Images are immutable.
43 OpenStack treats images as binary blobs and downloads them accordingly.
44
45 - **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs,
46 or to attach volumes to running VMs. OpenStack manages volumes using
47 Cinder services.
48
49 - **Guest Disks**: Guest disks are guest operating system disks. By default,
50 when you boot a virtual machine, its disk appears as a file on the filesystem
51 of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior
52 to OpenStack Havana, the only way to boot a VM in Ceph was to use the
53 boot-from-volume functionality of Cinder. However, now it is possible to boot
54 every virtual machine inside Ceph directly without using Cinder, which is
55 advantageous because it allows you to perform maintenance operations easily
56 with the live-migration process. Additionally, if your hypervisor dies it is
57 also convenient to trigger ``nova evacuate`` and run the virtual machine
58 elsewhere almost seamlessly.
59
60 You can use OpenStack Glance to store images in a Ceph Block Device, and you
61 can use Cinder to boot a VM using a copy-on-write clone of an image.
62
63 The instructions below detail the setup for Glance, Cinder and Nova, although
64 they do not have to be used together. You may store images in Ceph block devices
65 while running VMs using a local disk, or vice versa.
66
67 .. important:: Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
68 Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot
69 from volume), the Glance image format must be ``RAW``.
70
71 .. tip:: This document describes using Ceph Block Devices with OpenStack Havana.
72 For earlier versions of OpenStack see
73 `Block Devices and OpenStack (Dumpling)`_.
74
75 .. index:: pools; OpenStack
76
77 Create a Pool
78 =============
79
80 By default, Ceph block devices use the ``rbd`` pool. You may use any available
81 pool. We recommend creating a pool for Cinder and a pool for Glance. Ensure
82 your Ceph cluster is running, then create the pools. ::
83
84 ceph osd pool create volumes 128
85 ceph osd pool create images 128
86 ceph osd pool create backups 128
87 ceph osd pool create vms 128
88
89 See `Create a Pool`_ for detail on specifying the number of placement groups for
90 your pools, and `Placement Groups`_ for details on the number of placement
91 groups you should set for your pools.
92
93 Newly created pools must initialized prior to use. Use the ``rbd`` tool
94 to initialize the pools::
95
96 rbd pool init volumes
97 rbd pool init images
98 rbd pool init backups
99 rbd pool init vms
100
101 .. _Create a Pool: ../../rados/operations/pools#createpool
102 .. _Placement Groups: ../../rados/operations/placement-groups
103
104
105 Configure OpenStack Ceph Clients
106 ================================
107
108 The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and
109 ``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file::
110
111 ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
112
113
114 Install Ceph client packages
115 ----------------------------
116
117 On the ``glance-api`` node, you will need the Python bindings for ``librbd``::
118
119 sudo apt-get install python-rbd
120 sudo yum install python-rbd
121
122 On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node,
123 use both the Python bindings and the client command line tools::
124
125 sudo apt-get install ceph-common
126 sudo yum install ceph-common
127
128
129 Setup Ceph Client Authentication
130 --------------------------------
131
132 If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
133 and Glance. Execute the following::
134
135 ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images'
136 ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images'
137 ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups'
138
139 Add the keyrings for ``client.cinder``, ``client.glance``, and
140 ``client.cinder-backup`` to the appropriate nodes and change their ownership::
141
142 ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
143 ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
144 ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
145 ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
146 ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
147 ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
148
149 Nodes running ``nova-compute`` need the keyring file for the ``nova-compute``
150 process::
151
152 ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
153
154 They also need to store the secret key of the ``client.cinder`` user in
155 ``libvirt``. The libvirt process needs it to access the cluster while attaching
156 a block device from Cinder.
157
158 Create a temporary copy of the secret key on the nodes running
159 ``nova-compute``::
160
161 ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
162
163 Then, on the compute nodes, add the secret key to ``libvirt`` and remove the
164 temporary copy of the key::
165
166 uuidgen
167 457eb676-33da-42ec-9a8c-9293d545c337
168
169 cat > secret.xml <<EOF
170 <secret ephemeral='no' private='no'>
171 <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
172 <usage type='ceph'>
173 <name>client.cinder secret</name>
174 </usage>
175 </secret>
176 EOF
177 sudo virsh secret-define --file secret.xml
178 Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
179 sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
180
181 Save the uuid of the secret for configuring ``nova-compute`` later.
182
183 .. important:: You don't necessarily need the UUID on all the compute nodes.
184 However from a platform consistency perspective, it's better to keep the
185 same UUID.
186
187 .. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx
188
189
190 Configure OpenStack to use Ceph
191 ===============================
192
193 Configuring Glance
194 ------------------
195
196 Glance can use multiple back ends to store images. To use Ceph block devices by
197 default, configure Glance like the following.
198
199 Prior to Juno
200 ~~~~~~~~~~~~~~
201
202 Edit ``/etc/glance/glance-api.conf`` and add under the ``[DEFAULT]`` section::
203
204 default_store = rbd
205 rbd_store_user = glance
206 rbd_store_pool = images
207 rbd_store_chunk_size = 8
208
209
210 Juno
211 ~~~~
212
213 Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
214
215 [DEFAULT]
216 ...
217 default_store = rbd
218 ...
219 [glance_store]
220 stores = rbd
221 rbd_store_pool = images
222 rbd_store_user = glance
223 rbd_store_ceph_conf = /etc/ceph/ceph.conf
224 rbd_store_chunk_size = 8
225
226 .. important:: Glance has not completely moved to 'store' yet.
227 So we still need to configure the store in the DEFAULT section until Kilo.
228
229 Kilo and after
230 ~~~~~~~~~~~~~~
231
232 Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
233
234 [glance_store]
235 stores = rbd
236 default_store = rbd
237 rbd_store_pool = images
238 rbd_store_user = glance
239 rbd_store_ceph_conf = /etc/ceph/ceph.conf
240 rbd_store_chunk_size = 8
241
242 For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/.
243
244 Enable copy-on-write cloning of images
245 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
246
247 Note that this exposes the back end location via Glance's API, so the endpoint
248 with this option enabled should not be publicly accessible.
249
250 Any OpenStack version except Mitaka
251 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
252
253 If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section::
254
255 show_image_direct_url = True
256
257 For Mitaka only
258 ^^^^^^^^^^^^^^^
259
260 To enable image locations and take advantage of copy-on-write cloning for images, add under the ``[DEFAULT]`` section::
261
262 show_multiple_locations = True
263 show_image_direct_url = True
264
265 Disable cache management (any OpenStack version)
266 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267
268 Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``,
269 assuming your configuration file has ``flavor = keystone+cachemanagement``::
270
271 [paste_deploy]
272 flavor = keystone
273
274 Image properties
275 ~~~~~~~~~~~~~~~~
276
277 We recommend to use the following properties for your images:
278
279 - ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation
280 - ``hw_disk_bus=scsi``: connect every cinder block devices to that controller
281 - ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent
282 - ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent
283
284
285 Configuring Cinder
286 ------------------
287
288 OpenStack requires a driver to interact with Ceph block devices. You must also
289 specify the pool name for the block device. On your OpenStack node, edit
290 ``/etc/cinder/cinder.conf`` by adding::
291
292 [DEFAULT]
293 ...
294 enabled_backends = ceph
295 ...
296 [ceph]
297 volume_driver = cinder.volume.drivers.rbd.RBDDriver
298 volume_backend_name = ceph
299 rbd_pool = volumes
300 rbd_ceph_conf = /etc/ceph/ceph.conf
301 rbd_flatten_volume_from_snapshot = false
302 rbd_max_clone_depth = 5
303 rbd_store_chunk_size = 4
304 rados_connect_timeout = -1
305 glance_api_version = 2
306
307 If you are using `cephx authentication`_, also configure the user and uuid of
308 the secret you added to ``libvirt`` as documented earlier::
309
310 [ceph]
311 ...
312 rbd_user = cinder
313 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
314
315 Note that if you are configuring multiple cinder back ends,
316 ``glance_api_version = 2`` must be in the ``[DEFAULT]`` section.
317
318
319 Configuring Cinder Backup
320 -------------------------
321
322 OpenStack Cinder Backup requires a specific daemon so don't forget to install it.
323 On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add::
324
325 backup_driver = cinder.backup.drivers.ceph
326 backup_ceph_conf = /etc/ceph/ceph.conf
327 backup_ceph_user = cinder-backup
328 backup_ceph_chunk_size = 134217728
329 backup_ceph_pool = backups
330 backup_ceph_stripe_unit = 0
331 backup_ceph_stripe_count = 0
332 restore_discard_excess_bytes = true
333
334
335 Configuring Nova to attach Ceph RBD block device
336 ------------------------------------------------
337
338 In order to attach Cinder devices (either normal block or by issuing a boot
339 from volume), you must tell Nova (and libvirt) which user and UUID to refer to
340 when attaching the device. libvirt will refer to this user when connecting and
341 authenticating with the Ceph cluster. ::
342
343 [libvirt]
344 ...
345 rbd_user = cinder
346 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
347
348 These two flags are also used by the Nova ephemeral backend.
349
350
351 Configuring Nova
352 ----------------
353
354 In order to boot all the virtual machines directly into Ceph, you must
355 configure the ephemeral backend for Nova.
356
357 It is recommended to enable the RBD cache in your Ceph configuration file
358 (enabled by default since Giant). Moreover, enabling the admin socket
359 brings a lot of benefits while troubleshooting. Having one socket
360 per virtual machine using a Ceph block device will help investigating performance and/or wrong behaviors.
361
362 This socket can be accessed like this::
363
364 ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
365
366 Now on every compute nodes edit your Ceph configuration file::
367
368 [client]
369 rbd cache = true
370 rbd cache writethrough until flush = true
371 admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
372 log file = /var/log/qemu/qemu-guest-$pid.log
373 rbd concurrent management ops = 20
374
375 Configure the permissions of these paths::
376
377 mkdir -p /var/run/ceph/guests/ /var/log/qemu/
378 chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
379
380 Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system.
381 The provided example works for RedHat based systems.
382
383 .. tip:: If your virtual machine is already running you can simply restart it to get the socket
384
385
386 Havana and Icehouse
387 ~~~~~~~~~~~~~~~~~~~
388
389 Havana and Icehouse require patches to implement copy-on-write cloning and fix
390 bugs with image size and live migration of ephemeral disks on rbd. These are
391 available in branches based on upstream Nova `stable/havana`_ and
392 `stable/icehouse`_. Using them is not mandatory but **highly recommended** in
393 order to take advantage of the copy-on-write clone functionality.
394
395 On every Compute node, edit ``/etc/nova/nova.conf`` and add::
396
397 libvirt_images_type = rbd
398 libvirt_images_rbd_pool = vms
399 libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
400 disk_cachemodes="network=writeback"
401 rbd_user = cinder
402 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
403
404 It is also a good practice to disable file injection. While booting an
405 instance, Nova usually attempts to open the rootfs of the virtual machine.
406 Then, Nova injects values such as password, ssh keys etc. directly into the
407 filesystem. However, it is better to rely on the metadata service and
408 ``cloud-init``.
409
410 On every Compute node, edit ``/etc/nova/nova.conf`` and add::
411
412 libvirt_inject_password = false
413 libvirt_inject_key = false
414 libvirt_inject_partition = -2
415
416 To ensure a proper live-migration, use the following flags::
417
418 libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
419
420 Juno
421 ~~~~
422
423 In Juno, Ceph block device was moved under the ``[libvirt]`` section.
424 On every Compute node, edit ``/etc/nova/nova.conf`` under the ``[libvirt]``
425 section and add::
426
427 [libvirt]
428 images_type = rbd
429 images_rbd_pool = vms
430 images_rbd_ceph_conf = /etc/ceph/ceph.conf
431 rbd_user = cinder
432 rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
433 disk_cachemodes="network=writeback"
434
435
436 It is also a good practice to disable file injection. While booting an
437 instance, Nova usually attempts to open the rootfs of the virtual machine.
438 Then, Nova injects values such as password, ssh keys etc. directly into the
439 filesystem. However, it is better to rely on the metadata service and
440 ``cloud-init``.
441
442 On every Compute node, edit ``/etc/nova/nova.conf`` and add the following
443 under the ``[libvirt]`` section::
444
445 inject_password = false
446 inject_key = false
447 inject_partition = -2
448
449 To ensure a proper live-migration, use the following flags (under the ``[libvirt]`` section)::
450
451 live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
452
453 Kilo
454 ~~~~
455
456 Enable discard support for virtual machine ephemeral root disk::
457
458 [libvirt]
459 ...
460 ...
461 hw_disk_discard = unmap # enable discard support (be careful of performance)
462
463
464 Restart OpenStack
465 =================
466
467 To activate the Ceph block device driver and load the block device pool name
468 into the configuration, you must restart OpenStack. Thus, for Debian based
469 systems execute these commands on the appropriate nodes::
470
471 sudo glance-control api restart
472 sudo service nova-compute restart
473 sudo service cinder-volume restart
474 sudo service cinder-backup restart
475
476 For Red Hat based systems execute::
477
478 sudo service openstack-glance-api restart
479 sudo service openstack-nova-compute restart
480 sudo service openstack-cinder-volume restart
481 sudo service openstack-cinder-backup restart
482
483 Once OpenStack is up and running, you should be able to create a volume
484 and boot from it.
485
486
487 Booting from a Block Device
488 ===========================
489
490 You can create a volume from an image using the Cinder command line tool::
491
492 cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
493
494 Note that image must be RAW format. You can use `qemu-img`_ to convert
495 from one format to another. For example::
496
497 qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
498 qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
499
500 When Glance and Cinder are both using Ceph block devices, the image is a
501 copy-on-write clone, so it can create a new volume quickly. In the OpenStack
502 dashboard, you can boot from that volume by performing the following steps:
503
504 #. Launch a new instance.
505 #. Choose the image associated to the copy-on-write clone.
506 #. Select 'boot from volume'.
507 #. Select the volume you created.
508
509 .. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd
510 .. _Block Devices and OpenStack (Dumpling): http://docs.ceph.com/docs/dumpling/rbd/rbd-openstack
511 .. _stable/havana: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd
512 .. _stable/icehouse: https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse